Explore Even More Computed Structure Models Alongside PDB Data
In 2022, RCSB.org enabled access to ~1 million Computed Structure Models (CSMs) from AlphaFoldDB and RoseTTAFold (from ModelArchive).
With this week's release, RCSB.org now offers access to updated AlphaFold models plus ~68,000 additional CSMs from ModelArchive that include:
- Freshwater sponge proteins (modeled with ColabFold, ModelArchive: ma-coffe-slac)
- African swine fever virus proteins (modeled with AlphaFold, ModelArchive: ma-asfv-asfvg)
- Structural models of the Sphagnum divinum proteome (modeled using AlphaFold2, ModelArchive: 10.5452/ma-ornl-sphdiv)
- Hetero-dimer set of proteins from cancer interactome (modeled using AlphaFold, ModelArchive: 10.5452/ma-t3vr3
The CSMs accompany the initial release of the pre-packaged collection of 999,255 AlphaFold models released on 01-Jun-2022 (based on model organism proteomes; global health proteomes; Swiss-Prot sequences; and MANE (Matched Annotation from NCBI and EMBL-EBI)) and 1,106 core eukaryotic protein complexes produced by RoseTTAFold and AlphaFold2 from the ModelArchive.
The 999,255 AlphaFold models have been updated to reflect the most recent AlphaFold update (version 4); 26,934 of these CSMs reflect improvements to an issue that resulted in low accuracy predictions (with correspondingly low pLDDT).
Exploring PDB Structures and CSMs at RCSB.org
Experimental structures and CSMs are clearly identified throughout the website: a dark-blue flask icon is used for PDB structures and a cyan computer icon for CSMs.
Only experimentally-determined PDB structures are included in search results by default. Move the "Include CSM" from gray- to cyan-colored slider to activate.
A confidence score called “predicted local distance difference test” (pLDDT) is computed for each amino acid residue to estimate how well the method has converged (i.e., how well the predicted structure agrees with multiple sequence alignment data and PDB structure information). This pLDDT score is used to color 2D and 3D views of CSMs and to sort search results.
RCSB.org users can query, organize, visualize, analyze, and compare experimental structures and CSMs side-by-side:
- Search: User queries can be applied to all PDB structures and CSMs; PDB structures only; and can exclude either PDB structures or CSMs from the search results. Turn the slider button to cyan to include CSMs when searching the tab bar Basic Search or Advanced Search options; CSMs are not included in the search results by default.
- View and Organize Results: By default, search results are ordered using a query-based relevancy score. Results can be resorted using different criteria (e.g., listing experimental PDB structures first, per-residue confidence score (pLDDT)). The Refinements panel can be used to exclude PDB structures or CSMs from the results list.
- Explore Similar Proteins: "Group" summary pages and search results simplify exploration of PDB structures with similar sequence identity/UniProt ID or were deposited as part of the same study.
- Explore Individual Structures: Structure Summary Pages offer details of experimental PDB structures (e.g., 4HHB) and CSMs (e.g., AF_AFP44795F1).
- Assess quality: Analogous to the validation slider for experimental structures, all CSMs report global and local confidence levels as pLDDT scores.
- Visualize in 3D: View experimental PDB structures and CSMs in Mol* from Structure Summary Pages (e.g., AF_AFP44795F1). Use the standalone Mol* 3D Viewer to upload single or multiple data files, align structures, and run Structure Motif Search.
- Download: From Structure Summary Pages, download the ModelCIF data file hosted by the corresponding external archive (AlphaFoldDB or ModelArchive).
RCSB PDB will continue to develop resources to support exploration of experimental PDB structures and CSMs; feedback sent using the Contact Us button above or by email is greatly appreciated.
Related Resources: Learn more about CSMs and Experimental PDB Structures at RCSB.org
- Videos from Crash Course: Exploring Computed Structure Models at RCSB.org.
- RCSB.org Feature Documentation for CSMs
- PDB-101: Computed Structure Models
- RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning (2023) Nucleic Acids Research 51: D488–D508 doi: 10.1093/nar/gkac1077
- AlphaFold: Highly accurate protein structure prediction with AlphaFold (2021) Nature 596: 583-589 doi:10.1038/s41586-021-03819-2
AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models (2022) Nucleic Acids Res 50: D439-D444. doi: 10.1093/nar/gkab1061
- ColabFold: making protein folding accessible to all (2022) Nature Methods 19: 679–682 doi: 10.1038/s41592-022-01488-1
- RoseTTAfold: Accurate prediction of protein structures and interactions using a three-track neural network (2021) Science 373: 871-876 doi: 10.1126/science.abj8754;
Computed structures of core eukaryotic protein complexes (2021) Science 374: eabm4805 doi: 10.1126/science.abm4805
- pLDDT scoring: Highly accurate protein structure prediction for the human proteome (2021) Nature 596: 590–596 doi: 10.1038/s41586-021-03828-1
- Computed cancer interactome explains the effects of somatic mutations in cancers (2022) Protein Sci 31:e4479. doi: 10.1002/pro.4479
- Proteome-scale Deployment of Protein Structure Prediction Workflows on the Summit Supercomputer (2022) In 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Lyon, France, pp. 206-215 doi: 10.1109/IPDPSW55747.2022.00045
- Predicting Proteome-Scale Protein Structure with Artificial Intelligence (2021) N Engl J Med 385: 2191-2194 doi: 10.1056/NEJMcibr2113027
- Open-access data: A cornerstone for artificial intelligence approaches to protein structure prediction (2021) Structure 29: 515-520 doi: 10.1016/j.str.2021.04.010