Organization of 3D Structures in the Protein Data Bank
Video: Entry, Entity, Assembly, and Instance
Biomolecules are hierarchical structures. For example, proteins are composed of linear chains of amino acids that (often) fold into compact subunits which then can associate into higher level assemblies with other proteins, small molecule ligands, and water or other solvent molecules. Biomolecules in the Protein Data Bank (PDB) archive are organized and represented using this hierarchy to simplify searching and exploration.
Four levels of hierarchy are commonly used: Entry, Entity, Instance, and Assembly:
- An ENTRY is all data pertaining to a particular structure deposited in the PDB and is designated with a 4-character alphanumeric identifier called the PDB identifier or PDB ID (e.g., 2hbs).
- An ENTITY is a chemically unique molecule that may be polymeric, such as a protein chain or a DNA strand, or non-polymeric, such as a soluble ligand. Some entries may even have branched polymeric entities, such as oligosaccharides.
- An INSTANCE is a particular occurrence of an ENTITY. An ENTRY may contain multiple INSTANCES of an ENTITY, for example, many copies of a protein chain in a homooligomeric protein.
- An ASSEMBLY is a biologically relevant group of one or more INSTANCES of one or more ENTITIES that are associated with each other to form a stable complex and/or perform a function.
Relevance in Exploring the PDB
Understanding the hierarchy in these terms can help with exploring the PDB, searching for and identifying relevant structures, and visualizing/analyzing them meaningfully.
- Every ENTRY in the PDB contains at least one polymer ENTITY or one branched ENTITY (either a linear or branched oligosaccharide). The ENTRY is identified by a PDB ID.
- Since there can be multiple INSTANCES of a given ENTITY in the ENTRY, each INSTANCE of polymer or branched ENTITY is given a unique chain identifier or chain ID (of one or more alphanumeric characters; e.g., A, AA, ...). Chain IDs provide an easy way to refer to, select, and display every specific INSTANCE of each polymer and branched ENTITY. However, there is no specific rationale for assignment of chain IDs. Therefore, chain IDs assigned to an ENTITY in two different ENTRIES of the same protein may be different. Learn more about the identifiers here.
- Each INSTANCE of a non-polymer ENTITY is identified by the chain ID of the closest neighboring INSTANCE of a polymer ENTITY and is additionally distinguished with unique numbering (e.g., two heme groups associated with the same protein chain with ID = A may be identified as A101 and A102).
- The various groupings of ENTITY INSTANCES forming ASSEMBLIES are assigned assembly IDs (e.g., 1, 2, ...).
In summary, a deposited ENTRY contains one or more INSTANCES of at least one polymer or branched ENTITY arranged in one or more ASSEMBLIES.
- PDB ENTRY 2hbs includes two complete sickle cell hemoglobin tetramers, which contain heme cofactors and are surrounded by many water molecules.
- Each tetramer is an ASSEMBLY (with distinct Biological Assembly IDs) and is made up of two polymer ENTITIES: two INSTANCES of the alpha chain (shown in orange and yellow) and two INSTANCES of the beta chain (shown in shades of blue). The tetramer also includes four INSTANCES of heme (associated with the four INSTANCES of the protein chains) and many INSTANCES of water. The two tetramer ASSEMBLIES of hemoglobin are nearly identical. In cells, this tetramer is the functional unit that binds to and delivers oxygen in the blood.
- In addition to the polymeric ENTITIES, the ENTRY includes two non-polymeric ENTITIES: heme (with residue name HEM, shown here in red) and water (residue name HOH, shown in green). There are eight INSTANCES of heme, each bound to either an alpha or beta chain, and several hundred INSTANCES of water in the ENTRY.
- This ENTRY includes two polymeric ENTITIES, the alpha chain and the beta chain.
- Here, all four INSTANCES of the alpha chain ENTITY are colored (waters not shown for clarity).
- Here, one INSTANCE of the alpha chain ENTITY is colored (waters not shown for clarity).