I don't know how Foldit works internally, but I was thinking about, for example, global wiggle process.

Assumption 1: When running a global wiggle operation, the program has to calculate 'little-wiggle' for all the segments for each step of the wiggle, and it does this calculation with a sequence of segments, chosen sequentially, following a criterion, stadistic, etc.

Assumption 2: The time for a wiggle step = N*(time one little-wiggle), where N=segment_count()

Assumption 3: Perhaps there is a better solution for each wiggle step, but it should be necessary to calculate all of the combinations to know which is the best sequence of little-wiggles.

Assumption 4: I presume the algorithm to calculate little-wiggle of one segment is simple.

Approximation 1: Each little-wiggle influences subsequents, so in an only one step for N segments there is N!=N*(N-1)*...*1 possible combinations. But perhaps it is possible calculate the best little-wiggle. It would be N calculations, not N!

Approximation 2: with a limited combinations (=N) it is not a question of CPU power, but parallelization capability for little-wiggles.

Possible solution: A low power (cheap) FPGA can replicate the algorithm N times and so we have the best little-wiggle and then the (almost) best combination for each step of the global wiggle. In a FPGA it spends the same time for calculating one segment or N.

Condition: The little-wiggle algorithm should be simple to include it in a cheap FPGA.

For example:

Foldit does a wiggle, 1000 steps until it reaches an equilibrium point.

The criteria is, for example, sequential, so:

step 1 = little-wiggle(seg1) + little-wiggle(seg2) + ... + little-wiggle(segN)

step 2 = idem step1 but with state changed by step1

...

step 1000

With a FPGA:

step1 = little-wiggle(FPGA_lw())

where FPGA_lw() = best result of parallel execution of N little-wiggles.

step2 = we send the new changed state to the FPGA and it calculates N little-wiggles again.

It doesn't speed up the process, but find a better wiggle, so equilibrium point is reached sooner, and probably with higher score.

A pci board with a little fpga would be cheap...

Does it make sense to you?

to use GPU for parallel computing of that kind?

It's a good idea. A Nvidia Tesla has 448 CUDAs, what I don't know is the complexity of the algorithms in Foldit, and the floating point precision required.

I see

Well, I only can say that I offers my work to do it. I think is better with fpga, perhaps 100 or 200 times faster than CPU if the code is complex (and I'm hardware designer :D) but also I could test gpu if somebody shows a piece of code to test.

With FPGA it isn't 1000 steps.

Now, with step 1 = little-wiggle(seg1) + little-wiggle(seg2) + ... + little-wiggle(segN)

the time is 1000*N*time_little_wiggle

With FPGA

time= (1000*N < X > 1000)*time_little_wiggle

(presumably)