2 replies [Last post]
puxatudo's picture
User offline. Last seen 1 week 2 days ago. Offline
Joined: 04/07/2014
Groups: Go Science

I have a suggestion to make:

What if we post here some unknown/unsolved proteins, so that players could try to take the challenge in Sandbox mode?
I couldn't find a place with known primary structure but unsolved secondary, tertiary (or even quaternary) structure.

Susume's picture
User offline. Last seen 9 hours 5 min ago. Offline
Joined: 10/02/2011
UniProt database

Here's some great information that player MrZanav shared during the last office hours: You can search the UniProt database of proteins with known sequences, using various filters. This is a search that finds reviewed proteins of length 100 to 150: https://www.uniprot.org/uniprot/?query=length%3A%5B100+TO+150%5D+AND+reviewed%3Ayes&sort=score

It brings up page 1 of a list containing ~61,000 proteins of that length. How do you know which ones are solved? Click on the Entry column (on the left) for a particular protein. It will take you to that protein's page. Hit Ctrl-F to find text on the page, type pdb and hit Enter. If the protein has a PDB ID, the protein is solved. If there is no PDB ID found on the page, the protein is not solved (or at least not yet published).

How do you get the sequence? In the left hand column there are blue boxes with checkboxes. Click the one that says "sequence." It will display the sequence, and if you look just above that display box, there is a link to FASTA, which is a simpler format for copying the sequence.

What if you want shorter or longer proteins? Change the filter terms in the above link, or in the search bar at the top of the list page in UniProt.

puxatudo's picture
User offline. Last seen 1 week 2 days ago. Offline
Joined: 04/07/2014
Groups: Go Science
Great stuff! Thanks 'Susume'

Great stuff!
Thanks 'Susume' and 'MrZanav'.

'Formula350' had also re-posted something 'beta_helix' posted in the Discord channel.

"here is a list of unsolved structures you can play with (I sorted them by length, so that you can pick your size of 100-150 residues):

https://predictioncenter.org/casp14/targetlist.cgi?order=DESC&field=residues&view=all&assis_type=all&view_targets=all "


Developed by: UW Center for Game Science, UW Institute for Protein Design, Northeastern University, Vanderbilt University Meiler Lab, UC Davis
Supported by: DARPA, NSF, NIH, HHMI, Amazon, Microsoft, Adobe, Boehringer Ingelheim, RosettaCommons