## Protein Design Partition Tournament

We are announcing a friendly protein design tournament for Foldit players! This tournament is something of an experiment, and we are not sure exactly how it will unfold. Participation in the tournament is completely voluntary, and we will continue to post regular Foldit puzzles during the tournament, for those who do not wish to participate. There will be no prizes (except for bragging rights and a rare new Foldit Achievement), and we cannot guarantee that any scientific results will arise from the tournament. The main purpose of the tournament is to have fun folding proteins, and to inspire Foldit players to think differently about protein design; but we do also think the tournament could lead to higher quality protein designs.

Unlike our regular design puzzles, in which players compete to design proteins with the best absolute energy, this tournament is designed to reward proteins with the best energy landscape. For more discussion about energy landscapes, see parts one and two of this blog series.

The tournament will take place in two phases, over the next 6 weeks.

### Phase One: Defense

The first phase of the tournament will take the form of a Foldit design puzzle, similar to the regular Monomer Design puzzles that are posted every week. Players will have two weeks to craft their best protein design, which will have to defend itself in Phase Two. To enter the tournament, players must share their chosen design with scientists, using the Upload for Scientists button in the Save Solution menu. When you save your solution, give it the title ‘Tournament Submission’ in the Save Solution dialog box. Every player is allowed one tournament submission. If a player submits multiple solutions, only the most recent solution will be accepted. For the sake of competition logistics, team play will not be allowed in the tournament; only soloist solutions will be accepted.

The Phase One design puzzle will include two objectives:**Residue Count**: Designs may contain 70-100 residues, at a cost of 32 points per residue.**Secondary Structure**: Designs may be up to 10% α-helix.* Additional helices will be penalized at 10 points per residue.

*We will accept Phase One submissions regardless of their secondary structure content. However, we’d like to discourage players from submitting helical bundles and ferredoxin-like folds (a.k.a. "surfing hotdogs") that typically score well in regular design puzzles. Rather, this is a chance for players to showcase designs that aren’t normally competitive in regular design puzzles.

### Phase Two: Offense

Twenty Foldit player submissions will be selected to advance to Phase Two:**Five** will be the five top-scoring submissions from Phase One.**Five** will be hand-selected by the Foldit team, on merits of creativity and plausibility.**Ten** will be chosen at random from the remaining submissions.

For each of the 20 selections, we will create a special Partition Contest using the selected protein design. Each contest will be set up as prediction puzzle, similar to the regular De-novo Freestyle puzzles, except that the starting structure will be the fully-folded design. All 20 Partition Contests will be open to the entire Foldit community and will remain online for four weeks, during which time the selected designs will be vulnerable to “challenge.” Any Foldit player can challenge a design by joining its Partition Contest and attempting to refold the design into another high-scoring decoy structure.

The Phase Two contests will include an **RMSD Objective**: All solutions must differ from the starting model with an RMSD of at least 2.5 Å.

### Scoring

Ultimately, each design in the tournament will be evaluated by its partition function (described in the previous blog post), based on the decoys found by challengers in the Phase Two contests.

By challenging a design and finding a high-scoring decoy, you show that your opponent's sequence does not have 100% probability of adopting the folded structure, and that its partition function must be shared with your decoy structure. You effectively stake a claim in the partition function of that design; the higher the score of your decoy, the larger your claim in the opponent's partition function.

A player may make multiple challenges against a single design; in some cases, it may be more effective to make many moderate-scoring challenges rather than a single high-scoring challenge. In order to calculate the partition function for a design, we will cluster all of the contest solutions to identify representative states. Then, we’ll use the partition function to determine the probability of each state.

Unfortunately, we cannot calculate the partition function on the fly, so players can only estimate how well a design is resisting challenge by following the Contest leaderboards. However, we will post weekly updates throughout Phase Two, with updated partition functions for all 20 Contests.

The champion of the tournament will be the protein design with the highest probability, as determined by its partition function.

There will also be Achievements for the most effective challengers, who are able to stake the greatest claims in the partition functions of their opponents.

Finally, we’d like to point out that, while players may be tempted to aim for a high-ranking design in Phase One, what really counts is how well each design can withstand challenges in Phase Two. If you design a high-scoring protein in Phase One, but its sequence is also compatible with many high-scoring decoy structures, then in Phase Two challengers will easily find high-scoring decoys and stake large claims in your design’s partition function.

The Phase One design puzzle is online now! Happy folding!

( Posted by bkoep 70 565 | Thu, 08/30/2018 - 20:54 | 26 comments )We considered posting a completely separate puzzle for the tournament, and it probably would work just as well as the current dual-purpose puzzle. The main reason for this setup (i.e. allowing tournament submissions from a regular puzzle) is that we thought it would lead to more participation in the tournament.

If any players are on the fence about joining the tournament, then they can play the Phase One puzzle like a normal design puzzle, for now. If they change their mind at any point and decide to make a tournament submission, then they can easily draw from the work they've already invested in normal play—they won't have to start over in a completely new puzzle.

Alternatively, if we had made an secondary, optional puzzle specifically for the tournament, then any participating players would have to commit time and effort in an experimental puzzle worth zero points. Even then, there is still the possibility that their submission would not be selected for Phase Two of the tournament. We were afraid this would dissuade some players from participating in the tournament at all.

I don't think the current setup is too confusing for new players, and we will continue with the dual-purpose puzzle. If there is a particular point of confusion among players, please let us know! We can always clarify the rules, and amend the puzzle description if necessary!

It looks like Phase One penalizes structures with more than 10% a-helix content. Will Phase Two do this as well? Seems like in nature, the protein can always sample some helical conformations as it folds, so having the same 10% a-helix restriction in Phase Two would not be a good test. I think it would be better to let players use as much a-helix content as they want in Phase Two.

Good question! There will be no secondary structure restrictions in the Phase Two contests. The only purpose of helix restrictions in Phase One is to discourage players from designing the types of folds that we've already mastered.

Like you say, a protein may explore the entire energy landscape, so helix-rich conformations are perfectly valid decoys. It is up to the protein designer to pick a sequence that disfavors helical decoys!

If a player or group gets a solution in Phase One that they'd rather not share with everyone else, can they opt not to have their solution used in Phase Two? Will only solutions with `Tournament Submission’ as their Title be allowed to be starting points in the Phase Two Contests? Can evo's in Phase One be saved with the Title `Tournament Submission’ so they have the chance to be used in Phase Two?

Absolutely! Only scientist-shared solutions titled "Tournament Submission" will be considered for Phase Two of the tournament. Regular solutions, and scientist-shares with different titles, will not be shared with the rest of the community in Phase Two.

Evolved solutions will not be considered for Phase Two. We know this isn't ideal (Foldit players are great at working in teams), but we want to make sure all players have an equal shot at Phase Two selections, and we didn't have a good solution for handling mixed or ambiguous attribution in Evolver designs.

If a player's design from Phase One is chosen for Phase Two,

can that player play the Phase Two contest featuring his/her

own design from Phase One?

A player may challenge their own design in Phase Two, if they like. This could only ever "weaken" the partition function of that design, so it would not be a very good strategy for winning the tournament. But every challenge contributes to our predictions about the design's energy landscape, so there would still be some scientific value in challenging your own design!

A player could challenge their own design in Phase 1, to try to figure out how easy it is to challenge, before they pick one to submit. Can you easily rearrange part of it and get a similar score? If you can, others can as well.

I don' know which strategy is the best for submitting.

I understand that my criteria should be:

-one of my best scoring designs

-one of my most original design ? (no surfing hotdog etc)

-and still similar to something existing in the nature

But I've absolutely no idea which one has a chance to fold "uniquely" enough (to have a good partition).

My feeling is that sheet-based designs will be easily challenged by helix-based alternatives in phase 2.

Maybe pick solutions with few helix-favoring MALEK

(methionine, alanine, leucine, glutamate, or lysine)

residues. Maybe pick solutions with alternating

hydrophilic & hydrophobic residues (like sheets often

have) rather than ones with patterns that helices

often have.

https://en.wikipedia.org/wiki/Protein_secondary_structure

says proline and glycine tend to be helix-breakers

while tryptophan, tyrosine, phenylalanine, isoleucine,

valine, and threonine prefer to adopt β-strand conformations.

Often Design Puzzles don't let us use the

Rebuild Tool, yet other puzzles do. Will

Phase Two let us use the Rebuild Tool?

The Rebuild tool will be enabled in the Phase Two contests; however, it has been disabled in the Phase One design puzzle.

Is improving the partition function the same as negative design?

In protein engineering, "negative design" is any kind of design that is meant to destabilize a particular state (to *increase* the absolute energy of that state), rather than stabilize that state (*decrease* its energy). Usually this is used to bias the partition function towards another desired state. In our tournament, for example, you could try to bias the partition function toward your design by anticipating potential decoys, and changing your design sequence to decrease the score (*increase* the energy) of those decoys. In this case you would carry out "negative design" of the decoy states.

Negative design can also be useful for improving the specificity of protein interactions. If you design your protein to bind some target, but you discover that your protein also binds to other targets (and causes unwanted side effects, maybe), you could use negative design to eliminate off-target binding. For example, see Figures 2 and 6 of this paper.

On the other hand, there are applications where you might want to *reduce* the bias in a protein's partition function, so that your protein can exchange between two states easily. Suppose you want your protein to act like a switch, so that it changes state in response to subtle changes in the environment—or maybe you just want to study how a protein exchanges between two folds. See this paper for an extreme example of this kind of negative design.

Will you list which players' designs from Phase One are chosen for Phase Two? Will each Phase Two contest title include the name of the player who designed the contest's starting structure?

Thanks again,

Jeff

The title of each Phase Two contest will include the username of the player who designed the target.

Do you know when Phase Two will begin? Will you make a big announcement when it starts? Will Phase Two be a bunch of Science Puzzles or will it be a bunch of Contests?

Check back on Monday!

I'm eager to see how this works... I like the idea and enjoyed the first round, was interesting to work with the patterns that never scored well in monomers and play around with "off-book" ideas.

It's quite exciting to see and challenge all these puzzles together :)

Looking at the two energy landscape graphs in Blog Part 2, it looks like in the left hand one the 2nd best solution (orange dot) was about 25 energy points (250 foldit points) away from the best (blue dot). If our decoy score is 250 foldit points off of the original design, we can expect so tiny a slice of the partition function that it won't matter. Am I reading that right?

In the right hand graph, the 2nd best dot (orange) is less than one energy point (10 foldit points) away from the best (blue), so if we can get within 10 foldit points we can expect to capture a big slice (in this example, 32%) of the partition function. The next best dot (green) is right about one energy point (10 foldit points) from the best (blue), for a still respectable 12% of the partition function. The 4th best dot (red) is about 3 energy points (30 foldit points) from the best (blue) and gives only a 1.3% slice of the partition function.

I realize every energy landscape is different, but does it make sense as a ballpark figure to say that we can only capture a decent slice of the partition function if we get within 30 or so foldit points of the original score?

on the blog, I would interpret that a visible (small) partition slice would appear if we get less than 300 Foldit points from the original pose.

From -300 to 0 and more, we get a higher slice (more and more challenging to the pose).

```
https://fold.it/portal/node/2005638 discussed two different proteins:
One protein had a blue structure making up ~100% of the population and an
orange structure making up < 10^(-10) % of the population. If one defines
y as the difference in Foldit points between 2 structures, the ratio of
their populations (probability factor) is p, which equals 10^(y/14).
Thus, one has p=10^(y/14), log(p)=log(10^(y/14))=(y/14)log(10)=y/14, and
y=14log(p). If one sets the orange structure's Foldit score and Rosetta
Energy to 0 and finds the ratio of populations p using the orange
structure's population in the denominator, one gets the following chart:
ratio of
populations change in
fraction or Rosetta
percent of of probability change in Energy in
structure population population factor p Foldit pts kcal/mol
----------------------------------------------------------------------
blue ~100% ~1 or 10^0 >10^12 >168 <-16.8
orange <10^(-10)% <10^(-12) 1 0 0
The other protein had a blue state with 54% of the population, an orange
one with 32% of the population, a green one with 12% of the population,
and a red one with 1.3% of the population. These %'s total to 99.3% of
the population. If one sets the red state's Foldit score and Rosetta
Energy to 0 and uses the red state's population as the denominator in
all population ratio calculations, one gets the following chart:
ratio of
populations change in
fraction or Rosetta
percent of of probability change in Energy in
structure population population factor p Foldit pts kcal/mol
----------------------------------------------------------------------
blue 54% 0.54 41.538462 22.658306 -2.2658306
orange 32% 0.32 24.615385 19.476893 -1.9476893
green 12% 0.12 9.230769 13.513331 -1.3513331
red 1.3% 0.013 1 0 0
```

```
Using the probability factors p and Rosetta Energy changes U (in kcal/mol)
from the above calculations with the formula p=exp(-U/KT),
one can get KT in kcal/mol and the temperature T as below:
First, p=exp(-U/KT) gives ln(p) = -U/KT, KT = -U/ln(p), and T = -U/(K ln(p)).
With the above formulas,
p=10^12 for U = -16.8 kcal/mol gives KT = 0.608012 kcal/mol.
Furthermore, the pairs
p=41.538462 for U = -2.2658306 kcal/mol,
p=24.615385 for U = -1.9476893 kcal/mol, and
p= 9.230769 for U = -1.3513331 kcal/mol
all give KT = 0.608012 kcal/mol as well.
Next, using K = 1.9872041 x 10^(-3) kcal/(mol K) from
https://en.wikipedia.org/wiki/Boltzmann_constant#Value_in_different_units
gives T = 305.96 K, which using the formulas at
https://en.wikipedia.org/wiki/Conversion_of_units_of_temperature
converts to 32.81 deg C and 91.06 deg F, somewhere
between room temperature and human body temperature.
Finally, since -16.8 kcal/mol of Rosetta Energy gives 168 Foldit points,
one could also write KT as 6.08012 Foldit points.
```

Well explained blogs - whilst many vets will already understand the concept.

But why would the first puzzle (an experiment as u say) be run on 2 bases? either u want guys to create solutions that may pass yr partition test - or U don't. Please don't be lazy - create this puzzle purely for partition (and a separate one for those not interested in the experiment) It makes little sense as it stands and can only cause confusion for newbs.....and right now newbs (and retention here) is probably a strong argument for changing this. In short - this system makes no sense (with respect)

I and others believe this approach is flawed.