6 replies [Last post]
Susume's picture
User offline. Last seen 2 days 5 hours ago. Offline
Joined: 10/02/2011

Scientists at UC San Diego used facial recognition software to train a neural network to categorize small molecules based on NMR data: https://phys.org/news/2017-10-smart-facial-recognition-molecular.html

Joined: 09/24/2012
Groups: Go Science

They use 2D views and apply "facial recognition" algorithm. I suppose that they have to take several points of view, as we would do, human, if we needed to recognize a face without error.

(current public facial recognition software are not such powerful, as we all experimented - even a picture of a dog can get a candidate person "recognized" in it).

I wonder why they don't use 3D, as they have the related databases. Or may be this gives too much information, difficult to simplify in a set of key points?

I've the impression that our Foldit tool already does something like that when proposing remixes or rebuilds.

When the softwares will be capable to pattern recognition, it could be the end of some great Citizen Science projects (Serengeti, wales, Foldit etc).

BUT I'm still convinced Human will still be better to recognize several profound significations, starting with every links that we can make with art, culture, utility.

Just an exemple from actual facial recognition: we don't only recognize a face and add a name, we understand the emotional link, esthetics, context, humor, we are able to read it in a full video track (with much more info than in a static picture), a doctor will see a skin disease and we'll be able to infer some other information like income (an abstract concept), full live happiness etc.

I suppose that concerning proteins, categorizing folds can help to imagine families of functions, but that the researchers still have a lot of work in order to answer some fundamental questions.

rmoretti's picture
User offline. Last seen 17 min 9 sec ago. Offline
Joined: 01/15/2010
Groups: Foldit Staff
Recognition from experimental data, not from structure

The way I read it, the development here is that they're running the facial recognition algorithm on raw experimental data, rather than a structure itself. (If you have the structure, there's a number of well established methods for figuring out what other molecules it's similar to.)

Specifically, they're using Nuclear Magnetic Resonance (NMR) data. This is a routine chemical analysis technique, and (theoretically) should have all the information needed to reconstruct the structure of the molecule. The difficulty is translating the data from the form the NMR machine spits out to the structure. This is easy for simple structures, but gets harder as the molecules get larger and more complex.

So the concept is that someone finds an interesting molecule in soil bacteria, or marine sponges, or some other source. You can purify the compound, but you don't know what the structure is. But you can put that compound into the NMR machine, run a standard NMR experiment, and get out the NMR spectrum.

You can now feed that NMR spectrum to this new algorithm, and it will tell you what molecules with known structures are similar to your molecule with unknown structure. You don't get out the exact structure, but knowing what compounds are structurally similar makes working with the compound easier.

Joined: 05/19/2017
Groups: None
This would be a godsend in the lab

Having worked with NMR before in organic chemistry classes I can safely say that this, for lack of a better term, is freakin' dope!

Susume's picture
User offline. Last seen 2 days 5 hours ago. Offline
Joined: 10/02/2011
Applicable to X-ray diffraction data?

I wonder if something similar could be applied to X-ray diffraction data to help solve the phase problem. If you have some plausible decoys, maybe a trained neural network could pick the best decoy for the data, even if it could not solve the crystal structure outright. Then you could concentrate your refinement efforts on the decoy that got picked, and again use the neural net to pick among the refinements.

The WeFold experiment during CASP 11 generated millions of decoys for the now-published CASP 11 targets; maybe those could be used as training sets for this sort of neural network.

Susume's picture
User offline. Last seen 2 days 5 hours ago. Offline
Joined: 10/02/2011
On second thought if the

On second thought if the diffraction data is the input to the neural net, I guess you would have to have diffraction data for the decoys if you were going to use them for training, and diffraction data for the decoys does not exist. Is it possible to simulate diffraction data from a decoy in pdb format?

jeff101's picture
User offline. Last seen 17 hours 36 min ago. Offline
Joined: 04/20/2012
Groups: Go Science
Recognizing 2D Patterns is useful:

Some NMR data is 2D. Protein Distance and Contact Maps are 2D as well
(see http://memorize.com/distance-and-contact-maps/jeff101 for examples).
I think being able to classify structures based on 2D data would be helpful.


Developed by: UW Center for Game Science, UW Institute for Protein Design, Northeastern University, Vanderbilt University Meiler Lab, UC Davis
Supported by: DARPA, NSF, NIH, HHMI, Amazon, Microsoft, Adobe, Boehringer Ingelheim, RosettaCommons