Let LUA recipes send sequences to AlphaFold

Case number:845813-2012283
Topic:Game: Tools
Opened by:jeff101
Status:Open
Type:Suggestion
Opened on:Monday, November 1, 2021 - 20:12
Last modified:Wednesday, November 3, 2021 - 19:18

Would it be possible to make a LUA command like below to
run Foldit's implementation of DeepMind AlphaFold (DMAF)
from within a Foldit recipe?

conf,simi=runDMAF(StrName,CheckTime)

runDMAF takes Foldit's current structure and sends it to
DMAF as DMAF's original structure, as if the DMAF window's
'Upload Current' button was pressed. StrName is the name
to be used for this structure in the Name column of the DMAF
window. CheckTime is how often (in minutes) to check if DMAF
is done calculating the predicted structure for StrName.
Using 4.75 for CheckTime would check once every 4 min 45 sec
for StrName's result. runDMAF will run until DMAF is done
calculating the predicted structure for StrName. When runDMAF
ends, the predicted structure is copied into Foldit's current
structure, and this structure's Confidence and Similarity
scores are copied into the variables conf and simi for use
within the Foldit recipe.

Having a command like runDMAF would help us automate using
DMAF. So far, I have been making lists of sequences to try.
Then, for each sequence in the list, first I use AA Edit to
copy the sequence onto an extended loop structure, then I
send this structure to DMAF using the DMAF window's 'Upload
Current' button. While DMAF is finding predicted structures,
I send DMAF more sequences I want it to try. Later, I examine
each predicted structure DMAF has found to see if it looks
interesting or like I thought it would. Then I take some of
these predicted structures and do other things within Foldit
to raise their Foldit scores.

Recipes that use runDMAF will likely be very slow, but while
they run, players can do other things on other Foldit clients.
Such recipes could list key info about each original and
predicted structure as they run, and I think they could create
*ir_solution files of the predicted structures as they are
found. Players could then copy key *ir_solution files to other
Foldit clients for inspection or further processing.

(Mon, 11/01/2021 - 20:12  |  4 comments)


Joined: 04/20/2012
Groups: Go Science

To put my request above in context, below is what I've been
doing lately in Foldit:

I've been playing expired Puzzle 2047 to explore what structures
DMAF gives for certain sequences. I like Puzzle 2047 because it
has 90 mutable residues. Also, even though it is a tetramer puzzle,
I keep its 4 monomers far apart so they all act like isolated
90-segment monomers. It would be better if I could change the # of
segments in each monomer. It would also be better if I could use
cysteine residues in this puzzle.

Since I'm curious how well DMAF works on known structures, I found
some pdb files with features I want to include in my Foldit designs.
One pdb file gives the coordinates for 77 segments of a 441-segment
protein. I've tried taking different 90-segment ranges from the
441-segment protein that include the 77 segments of interest. Some
of these segments are cysteines, which Puzzle 2047 does not allow,
so I've tried substituting alanine, serine, or valine for each
cysteine. I've also tried taking the 77 segments of interest and
padding each end with 6 or 7 valines, 6 or 7 isoleucines, etc. I've
also tried shifting the 77 segments of interest by using less valines
at one end and more valines at the other end, for example. So far,
DMAF has predicted many different structures depending on what I
choose for its 90 segments. Some of these are very different from
what the pdb file shows, and so far I haven't found a really good
match to the structure the pdb file shows.

Joined: 04/20/2012
Groups: Go Science
In the pdb file I've been using, I don't think the 
cysteines form disulfide bonds. What amino acid 
would be the best substitute for these cysteines?
Serine has practically the same shape but wants 
to form hydrogen bonds. Valine won't form hydrogen
bonds but is a little bigger. Alanine won't form
hydrogen bonds and is a little smaller, but it is
one of the MALEK residues that favors helices, and
the pdb structure I've been using seems to be just
sheets and loops. Being allowed to use cysteines
would sure help. Could you make puzzles that allow
cysteines but instead give large penalties for them?
bkoep's picture
User offline. Last seen 1 day 8 hours ago. Offline
Joined: 11/15/2012
Groups: Foldit Staff

Thanks for the suggestion, jeff101! Unfortunately, I don't think we will introduce Lua functions for AlphaFold requests, mostly due to a shortage of GPU resources that we use to run AlphaFold.

Since the Foldit server is sharing GPUs with other researchers, we want to be judicious about the kinds of jobs that tie up those GPUs. Right now, that means we are requiring players to physically click the Upload button for each AlphaFold request, and we will not allow uploads to be automated by Lua scripts.

That also means that we would like to restrict AlphaFold requests to open puzzles only. The Foldit team does not retrieve solutions that are created in a puzzle after the puzzle expires, and those solutions are not included in downstream scientific analysis.

If you want to explore AlphaFold predictions for arbitrary sequences, I recommend using the AlphaFold Google Colab notebook. The Colab notebook will also allow you use natural sequence alignments (this is disabled for the Foldit AlphaFold tool); this will likely give you better and more consistent predictions for natural protein sequences.

Joined: 04/20/2012
Groups: Go Science

Thanks for the info.

I hope you will post a new puzzle like 2047 soon.
Puzzle 2047 stopped letting me send sequences to
AlphaFold yesterday, and I hadn't yet run all the
90-segment sequences I wanted to test with it.

Please also consider letting the next puzzle like
2047 include cysteine residues. Even if there is
a large Foldit score penalty for using cysteines,
it would be interesting to see what AlphaFold
predicts for structures containing cysteine.

Thanks again,
Jeff

Sitemap

Developed by: UW Center for Game Science, UW Institute for Protein Design, Northeastern University, Vanderbilt University Meiler Lab, UC Davis
Supported by: DARPA, NSF, NIH, HHMI, Amazon, Microsoft, Adobe, Boehringer Ingelheim, RosettaCommons