PDB export and project aims

Case number:845799-990932
Topic:Biochem
Opened by:ipatrol
Status:Closed
Type:Suggestion
Opened on:Wednesday, October 26, 2011 - 01:26
Last modified:Wednesday, October 23, 2013 - 12:39

There are many programs, freely available, that can greatly expedite the process of molecular dynamics modeling. The failure to allow export to a free format combined with the closed-source nature of this crowdsorced operation seems inexcusable. Furthermore, the lack of clarity over the copyright status of our contributions raises serious concerns in regard to the academic philosophy that ostensibly guides this project. This much secrecy is not healthy or beneficial, indeed users have historically proven to be the greatest source of intellectual capital, and allowing more savvy users to try out their own machinations. UW should not try to hoard accolades and credit in the hopes of seizing extra grants. We all enjoy the opportunity to contribute to science, but when pathways are blocked, the most valuable players are likely the first to balk at the restrictions on their own work.

(Wed, 10/26/2011 - 01:26  |  22 comments)


ipatrol's picture
User offline. Last seen 36 weeks 4 days ago. Offline
Joined: 12/04/2010
Groups: None

Love how I get votes and no comments

Joined: 04/19/2009

ipatrol -

I didn't downrate you but was tempted, just on the basis of your aggressive language. "inexcusable" "ostensibly" "hoard accolades and credit in the hopes of seizing extra grants" - these were not necessary to get your point across.

Joined: 11/10/2007
Groups: Window Group

The Rosetta biochemistry package that Foldit uses, and a standalone version of the game, are both freely available for academic users. You can download them here:

http://depts.washington.edu/uwc4c/express-licenses/assets/rosetta/
http://depts.washington.edu/uwc4c/express-licenses/assets/foldit/

ipatrol's picture
User offline. Last seen 36 weeks 4 days ago. Offline
Joined: 12/04/2010
Groups: None

You're missing the point. I don't want access to the client's internals, I want the ability to interface directly with the game using molecular dynamics software. There can be restrictions and limitations, but it ought to be open as an option.

infjamc's picture
User offline. Last seen 15 hours 19 min ago. Offline
Joined: 02/20/2009
Groups: Contenders

Note to developers: I've spoken to ipatrol on global chat, so here's further clarification on what's actually being requested.

1. Ability to export a structure from the game iself to PDB, MDL, etc. The intent is not for research use, but to further refine the structure from the game with a third-party molecular dynamics program.

2. Likewise, this would require the ability to import the same said format supported in #1.

3. Part of the difficulty, I think, is finding a way to keep the game fair while allowing exporting/importing solutions. After all, not everyone would have access to molecular dynamics software (or have time to learn them and use them), which could theoretically give some players an unfair advantage. One possible solution I see would be "importing a PDB, etc. makes you an evolver," but even this isn't perfect since it's possible for people to use a third-party program to further refine their own structure.

4. Ultimately, ipatrol's wish for the long term is to make the project open-source, which has been suggested before ( http://fold.it/portal/node/986352 ). Obviously, such a drastic step would require server-side rescoring of structures to prevent cheating, and most likely the use of more builds than only beta and main (Firefox uses 4 groups, for example).

5. Regarding the issue of who's going to monitor the latest additions to the code, ipatrol's idea is that trusted editors could be selected (either via a community vote or by approval of the Foldit development team at UW) as a project leader of sorts for delegation purposes.

==> Again, the ideas are his and not mine. I think there's some promise in theory-- the hard part is implementing it.

ipatrol's picture
User offline. Last seen 36 weeks 4 days ago. Offline
Joined: 12/04/2010
Groups: None
Topic: Biochem » General

Ok, how about let's start with the PDB export. These opaque OPDB files really ought to have some documentation. If they had that, then it wouldn't be hard to whip up a conversion script.

Joined: 10/11/2011
Groups: None

"but it ought to be open as an option."

why?

infjamc's picture
User offline. Last seen 15 hours 19 min ago. Offline
Joined: 02/20/2009
Groups: Contenders

My impression is that the heart of ipatrol's argument boils down to the following:

"There are other tools out there besides the Rosetta engine that could help with the general goal of protein structure prediction and design. So why not combine the strengths of the different programs by allowing for the possibility of passing the results from one program to another?"

ipatrol's picture
User offline. Last seen 36 weeks 4 days ago. Offline
Joined: 12/04/2010
Groups: None
Topic: General » Biochem

I agree, let's focus on that aspect. At the very least, users should be allowed to make their own guides.

ipatrol's picture
User offline. Last seen 36 weeks 4 days ago. Offline
Joined: 12/04/2010
Groups: None
Status: Open » Open

You know, on denovo puzzles, you can use a script combined with the script log to export the letter sequence, then substitute the newlines for hyphens and the letters for the three-letter sequences and import that into Avogadro, so export can always be arranged. I'm also sure that the IR_PUZZLE files can be spoofed and manipulated to point to a PDB file, so import is always possible. Since with a little time it can happen anyway, I suggest you stop being obstinate and just add the feature so we can work with you instead of against you.

Joined: 10/11/2011
Groups: None

Perhaps if you werent so disgustingly rude, they would be more inclined to take notice of what you have to say.
I read your posts ipatrol and you raise my ire, and Im just a player.

" you get more bees to honey than you do with vinegar"...you may wish to take that on board

B_2's picture
User offline. Last seen 5 years 51 weeks ago. Offline
Joined: 11/29/2008
Groups: None

Is ipatrol just making noise? It doesn't seem from the rankings that he/she is very much involved with playing the GAME of FoldIt.

I don't think using outside tools is at all good for the GAME. It will take all competitiveness out of it if we have things like this http://foldit.wikia.com/wiki/Puzzle_482 going on, where there is perhaps one in a thousand players who could use this process, and it immediately puts other solo and group contributions down to also-ran status.

It will make the original goals of FoldIt meaningless, and turn it into just another competition between a few people who have the skills and tools that are not part of the GAME. Apparently there is no way to prevent the use of such tools, but hopefully players will take the honorable approach and not use them.

Use ONLY what is provided in the game, what information is provided in the puzzles by the developers, and whatever creativeness you can achieve using the recipe system as provided.

infjamc's picture
User offline. Last seen 15 hours 19 min ago. Offline
Joined: 02/20/2009
Groups: Contenders

Brick:

I totally saw your comment coming, so I'll respond to your concerns regarding Puzzle 482.

1. The concern regarding fairness is exactly why I made my team's approach to that puzzle public afterward-- so that everyone could try it. Now, in retrospect, I do admit that I could have mentioned the existed of the 2L3B NMR model as soon as I discovered it. But I chose not to do so because I don't want to skew the aggregate results, which could occur if *everyone* tries to attempt a full copy without thinking.
.

2. I would argue that having the extra information actually HELPED the goal of advancing science in this case. The result of Puzzles 482 and 485 speak for themselves: no other team, not even wudoo's super-scripts (which have been very successful in most other puzzles), even came close. My interpretation of this result is that this is one of those proteins where the existing automated methods have trouble with. Since the NMR model is in a public database, and I manually incorporated the information from 2L3B into the game the hard way (as opposed to simply loading the PDB into the game), I don't see anything wrong with the way I approached this puzzle.
.

3. Now, you could legitimately argue that I should not get soloist credit for the puzzle because I was essentially evolving the 2L3B model. That would be fine for me, because my intention behind looking up the sequence on RCSB was *for the sake of science*, to get one step closer to the correct structure by conducting homology modeling manually. It just happened that there was an exact match this time, so I can understand why you feel this is unfair.
.

4. If you are concerned that looking up outside information would consistently as a strategy for maximizing Foldit score (and thus put other players at a disadvantage), I can assure from personal experience that this is not the case *as a rule*. I can say this because--at the risk of boasting--I know that I'm very good at reverse-engineering solutions. Based on my past experiments with a few puzzles for which the native is known, my conclusion is that manual copying, doesn't pay, because you will almost always end up with a lower score than playing normally. Take Puzzle 482 for example: my initial solution only scored 10085, and it was extensive further rebuilding by myself and teammates that picked up the 600+ extra points. In other words, the decision to use outside information made a difference *this time*, but this is the exception rather than the rule.
.

5. Now, as for allowing PDB imports and exports (so that people can edit structures directly with a 3rd-party program rather than having to manually incorporate extra outside information into the game like I did in Puzzle 482):

Personally, I'm neutral on the issue because I can see both sides. On one hand, this might help advance science if the use of a 3rd-party program could achieve things that are difficult with only Foldit. At the same time, I also understand your concern that this could give "experts" an advantage. So, if I have to choose, my compromise solution would be "allow it with limitations: anything that's exported/imported puts you in evolver mode."

infjamc's picture
User offline. Last seen 15 hours 19 min ago. Offline
Joined: 02/20/2009
Groups: Contenders

And before I forget:

Look up the NMR structure yourself ( http://www.pdb.org/pdb/explore/explore.do?structureId=2L3B ) and compare that to my team's 10776 solution. You'll see that while the sheet sections are similar, the loop sections deviated significantly. So this is not a case where we scored high due to "cheating" (i.e. simply copying a known structure), but rather that an inaccurate NMR structure was further improved.

B_2's picture
User offline. Last seen 5 years 51 weeks ago. Offline
Joined: 11/29/2008
Groups: None

Well, since we can't load it into foldit, and your team solution isn't shared for everyone to open, I guess we can't do that.

infjamc's picture
User offline. Last seen 15 hours 19 min ago. Offline
Joined: 02/20/2009
Groups: Contenders

For the record, I would have uploaded my earlier 10085 solution or even the latter 10658 if there's a way to do so. After all, as much as I like to see myself and my team score high, I care about the science even more; if uploading means not being able to be top-ranked, so be it.

But the problem is that the program currently doesn't allow for uploading to everyone.

beta_helix's picture
User offline. Last seen 17 hours 49 min ago. Offline
Joined: 05/09/2008
Groups: None
Status: Open » Open

"I'm also sure that the IR_PUZZLE files can be spoofed and manipulated to point to a PDB file, so import is always possible"

if that becomes a problem, we will address it.
Until then, as Foldit is game there is no reason to let you import/export files.

If you "want the ability to interface directly with the game using molecular dynamics software" yourself (for scientific reasons or for fun) then you can do exactly that with the standalone version:
http://depts.washington.edu/uwc4c/express-licenses/assets/foldit/
You can load in any file you want, run Foldit and output your results. Then you can interface with external programs all you want.

If there is a particular external program that you feel would be beneficial to Foldit, then you can open a specific feedback about that. We could add it to the game, that way every Foldit player would be able to use it.
__________________

In regard to Brick & infjamc's comments about Puzzle 482... I'll try to keep this simple:

Not every Foldit puzzle is a CASP or Unsolved protein. Basically, those cases are rare (getting our hands on those puzzles is not easy, it's not like unsolved structures are being solved every day, that's why we need Foldit in the first place!) and if we only posted unsolved cases you would have very few puzzles.

Every new feature we come up with requires a lot of tweaking (as you all know, just look at Exploration puzzles!) and there is no way we could find out how well these new features are doing if we can't check your solutions against the native. This means that we would have to wait until the native comes out to get ANY feedback on your puzzles, which is not a feasible solution.
To make it fair, we could include the native guide on every non-unsolved puzzle, but that would be pointless except to make sure the tools are sufficient (which we sometimes do).

Every puzzle that has the word "unsolved" in the description (or is a new CASP puzzle) will not have a publicly available native. We could only give out points for those puzzles, and the rest could just be for fun (or essentially BETA puzzles to improve the tools in the game) but if that is a route players want to go down, it would require a developer chat as this would be quite a change!

Specifically speaking of Puzzle 482... I PMed infjamc about his top-scoring solution and got an even more detailed reply than what is posted in the wiki.
I showed his reply at last week's Foldit meeting and because of it 2 things were decided:

1) we are going to try to find even more unsolved cases (we have already contacted the PDB to be able to get their sequences 3 weeks before they release them, just heard back today that this could be possible)

2) Puzzle 482 was actually designed to test whether or not the current version of the Alignment Tool was sufficient to find the correct alignment. All of you and I have known for a while that the Alignment Tool was difficult to use, but the results from 482 finally proved to the entire Foldit Team that we need to add cutpoints to the Alignment Tool. infjamc has been saying this for over a year:
http://fold.it/portal/node/986943

Thanks to the results for Puzzle 482, the current top priority is to implement cutpoints into Foldit so that you can add them yourselves (it was always on the list, but is now at the top of the queue!). This should significantly help in Exploration Puzzles, Electron Density Puzzles, Symmetry, and obviously with the Alignment Tool.

I hope this makes your day, infjamc! :-)

I'll be happy to discuss this further, but a different feedback or Forum post would be more appropriate. Thanks!

infjamc's picture
User offline. Last seen 15 hours 19 min ago. Offline
Joined: 02/20/2009
Groups: Contenders

While it's nice to hear that you're finally starting to incorporate cut-points into Foldit, I have several clarification questions:

1. So exactly what's going on with the protein involved in 482/485? Given the lack of a crystal structure, is the NMR model considered an accurate "native"? (Again, I'm asking because I was surprised that my team's top solution scored much better than Model 1.)

2. In addition to cut-points, also important would be the ability to align the template like a guide when doing partial threading-- in other words, be able to copy the backbone configuration relative to the local selected region rather than the whole protein. This would make it MUCH easier to mix-and-match solutions when two templates differ significantly.

3. For the sake of fairness, would you recommend that I forfeit my first-place finish due to the use of 2L3B? And for future purposes, would you prefer that I make the information public if there is a good template from a public database available for a specific puzzle (as long as the sequence isn't an exact match)?

beta_helix's picture
User offline. Last seen 17 hours 49 min ago. Offline
Joined: 05/09/2008
Groups: None
Status: Open » Closed

To answer your first question, this NMR model is considered as accurate a native as anything solved so far for this protein, but if you look at the variation across all 20 NMR models:

you can see that there is no 1 correct native solution. So, while your Foldit prediction was different from Model 1 of 2l3b, it was actually closer to some of the other models in the NMR ensemble (http://www.pdb.org/pdb/explore/explore.do?structureId=2L3B)

As the MPMV-PR case showed quite clearly, sometimes NMR models aren't "correct", but when that is the only model available (until Foldit players solve the crystal structure ;-) it's considered to be the native.

To answer your third question, there will be no forfeiting of points from anyone.

If you want my honest opinion for what I would prefer for future purposes... ideally (other than CASP or structures where we ask you to help solve the unknown structure) nobody would try to look up any templates, homologs or natives.
Because if we are posting a puzzle to test out new tools then we want you to try to fold using the new tools (not looking up extra external information). Now, we realize that is a tough request to ask so we have never explicitly ask any players not to do that, but we noticed this issue back when we started writing the first Nature paper.
There was no way that we could publish any non-blind results when it was clear that players were able to look up natives.

Again, if we want to discuss this further, I will be happy to continue in a different feedback/forum. This feedback is asking a different question, which I believe I answered previously, so I am closing this thread.

B_2's picture
User offline. Last seen 5 years 51 weeks ago. Offline
Joined: 11/29/2008
Groups: None

I guess if a puzzle is based on a known or previously predicted solution, we should just start with that solution to save everyone the work of trying to manually replicate it in the FoldIt client, and we can all be on a level playing field. We certainly don't need the exercise of assembling the long string starting positions if people can just go to the databases and get the solutions.

The ideal of "advancing science" is admirable, but I thought the intent of this game was to develop tools and processes WITHIN the game confines to improve the ability to predict structures.

infjamc's picture
User offline. Last seen 15 hours 19 min ago. Offline
Joined: 02/20/2009
Groups: Contenders

1. I'm actually in favor of your idea of pre-loading a known solution to level the playing field if the purpose of the puzzle is to, say, refine an NMR model (because a crystal structure is not available). Of course, this wouldn't work in puzzles that are meant to test the tools (e.g. 482), but it should work in other cases. Believe me, even though I'm good at reverse engineering solutions, I would prefer not to have to do that manually myself either.

2. Well, the game contributes to both improving automatic algorithms and predicting structures for which current methods simply fail to solve. The two goals are not mutually exclusive.

Sitemap

Developed by: UW Center for Game Science, UW Institute for Protein Design, Northeastern University, Vanderbilt University Meiler Lab, UC Davis
Supported by: DARPA, NSF, NIH, HHMI, Amazon, Microsoft, Adobe, RosettaCommons