The introduction of a fresh public database containing protein structures predicted by artificial intelligence has the potential to revolutionize the field of biology.

Advertisement

In Science, Volume 373, Issue 6554, a version of this narrative was published. You can access the PDF.
Recently, two groups released advanced modeling programs that can accurately predict the 3D atomic structures of proteins and some molecular complexes. These programs are the result of years of collaboration between computer scientists, biologists, and physicists. The biggest breakthrough came when one of the teams used artificial intelligence (AI) programs to solve the structures of 350,000 proteins from humans and 20 model organisms. This includes well-known organisms such as Escherichia coli bacteria, yeast, and fruit flies, which are commonly used in biological research. The team plans to expand its protein modeling to cover all categorized proteins, totaling approximately 100 million molecules.
John Moult, a protein folding expert at the University of Maryland, expressed his amazement, stating that accurate computer models that could complement experimental methods such as x-ray crystallography have been a dream for structural biologists for decades. Moult never thought this dream would become a reality.
The computer model that achieved this breakthrough is called AlphaFold and was developed by researchers at DeepMind, an AI company based in the UK and owned by Alphabet, Google’s parent company. AlphaFold dominated the CASP competition in the fall of 2020, surpassing other competitors with a median accuracy score of 92.4 out of 100. However, DeepMind researchers did not disclose the details of how they theoretically mapped protein shapes or provide the underlying computer code of AlphaFold. This lack of information frustrated other teams, as they were unable to build upon the progress. This changed when researchers at the University of Washington in Seattle developed their own highly accurate protein structure prediction program called RoseTTAFold and shared it with the public. DeepMind researchers also published the details of AlphaFold in Nature.
Both AlphaFold and RoseTTAFold utilize AI to identify folding patterns in large databases of solved protein structures. They predict the most likely structure of unknown proteins by considering the physical and biological rules governing the interactions of neighboring amino acids in a protein. The researchers at the University of Washington used RoseTTAFold to create a structure database of G-protein coupled receptors, which are common drug targets.
DeepMind researchers have now reported in Nature that AlphaFold has successfully created 350,000 predicted protein structures, which is more than double the number previously solved experimentally. AlphaFold has provided structures for almost 44% of all human proteins, including nearly 60% of the amino acids encoded by the human genome. AlphaFold has determined that many other human proteins are “disordered,” meaning they do not adopt a single structure. However, these disordered proteins may adopt a structure when they bind to a protein partner or naturally adopt multiple conformations.
A database of DeepMind’s protein predictions, created in collaboration with the European Molecular Biology Laboratory (EMBL), is publicly accessible. This availability of data has been highly praised and is expected to accelerate research progress.
The 3D structure of a protein plays a major role in its function, so the DeepMind library will greatly contribute to understanding the functionality of thousands of unknown proteins. Edith Heard, the director-general of EMBL, believes that this breakthrough will transform our understanding of how life works.
Collaborators of DeepMind have already used AlphaFold2 to develop novel enzymes that can break down plastics faster in the environment compared to previous enzymes. Moreover, the predictions have opened up new possibilities for drug development to treat neglected diseases. Ewan Birney, director of EMBL’s European Bioinformatics Institute, states that this dataset will be one of the most important since the mapping of the human genome.
The impact of these predictions will extend beyond protein structure determination. Experimentalists who solve structures often face challenges in interpreting data from x-ray crystallography and cryo-electron microscopy experiments. Having a computer model can assist in overcoming these challenges. Minkyung Baek, one of the researchers at the University of Washington, believes that in the short term, the models will boost structure determination efforts, and in the long run, they may gradually replace experimental methods.
However, structural biologists will not be out of work if computational methods replace experimental ones. Both experimental and computational scientists are now focusing on the more complex task of understanding protein-protein interactions and the molecular changes that occur during these interactions. David Baker believes that this breakthrough will revolutionize the field, calling it an exciting time for structural biology.

Advertisement
Advertisement