update on technical difficulties

Dear Foldit users,

as most of you know, we've had partial functionality of the game for a number of days. Here is some background behind what happened. Two days before the Nature paper came out, the cooling system in the machine room hosting foldit servers malfunctioned. The temperature reached the threshold when the power automatically shuts off in the entire room. This has happened before, and in the past we have recovered quickly by starting up all our servers. Unfortunately, this time the sudden loss of power completely trashed our RAID file server. Unlike our database server and the web servers, we don't have the shadow copy of the filesystem so when it went down everything went down. During that time most of the key personnel was out of town (I was in Norway, Seth in Japan, Firas in Turkey, etc), and helping remotely proved to be very challenging.

We then tried to copy all the information from the filesystem RAID disks. The second attempt of trying to copy this massive set of files worked. At that point we decided to resurrect the copy of the server on our development machine. This copy had some functionality missing, but because of the big press wave, we decided that it was better to have a partly working server than no server at all. The next day the main servers were ready at which point we switched back. As a result of all the copying, some parts of the system didn't have the right permissions set which still left some parts of the system not functional. At the same time, the database server was at it's limits for the queries it can possibly server due to the fact that we had an 80-fold increase in daily registrations due to the press wave. We've had press waves before, but we've never seen this magnitude of interest. the DB server query queue had to be restarted several times in order to not completely bog down. this of course created many in-game and web portal timeouts.

From what we can tell it seems that all the functionality is back.

we've learned several things from this perfect storm. We will use facebook as an update mechanism in the future in the case our web servers are not functional. we'll also have our development server ready to be switched as a main server with limited functionality as a temporary solution. Longer term, as soon as we get funds to revamp our server structure, I have decided to move the entire server structure to the cloud (most likely Amazon services), which will make it completely robust to failure of any individual machine. Furthermore, we can easily scale up the entire infrastructure any time we need to increase our capacity.

Our apologies to all of you who were frustrated with the lack of full functionality in the past days. It is still possible that partial of full downtimes happen, but we'll be keeping you up to date on the blog (and facebook if the whole server is down.

feel free to respond to this post if you notice some functionality still missing.


( Posted by  zoran 100 3100  |  Tue, 08/10/2010 - 09:46  |  5 comments )
Joined: 12/14/2008

Please also use Twitter for important messages. You can't get a message faster and widespread around the world (and foldit IS basically worldwide) then Twitter.

ferzle's picture
User offline. Last seen 22 weeks 3 days ago. Offline
Joined: 12/04/2007
Groups: None
Missing functionality

It appears that after not playing a lot in the past year, I have lost a lot of my functionality in playing the game. I have to struggle just to make it in the top 100 now. Anything you can do about that?

Congrats on the paper, by the way. The timing was perfect--I was able to add a reference to it in a conference paper that I finished this week.

zoran's picture
User offline. Last seen 3 years 49 weeks ago. Offline
Joined: 11/10/2007
Groups: Window Group
a solution

yes, the solution is simple Ferzle: you have to play more to catch up.

btw, if you're referring to the design of foldit, you should use this reference http://www.cs.washington.edu/homes/zoran/foldit-fdg10.pdf published in Foundations of Digital Games this year.

Joined: 08/15/2010
Groups: None
The "request new password" still not working

And it has been dysfunctional for months...

Had to register a new account to be able to write this.

saksoft2's picture
User offline. Last seen 1 day 12 hours ago. Offline
Joined: 08/06/2010
Groups: Contenders
query tuning

If your registration database is powered by MySQL, contact me. I might be able to help avoid more problems.

