Find membrane proteins in the PDB
In order to identify all transmembrane proteins in the PDB, we are loading the manually annotated transmembrane dataset from mpstruc (UC Irvine).
Mpstruc provides useful information about integral membrane proteins whose crystallographic, or sometimes NMR, structures have been determined to a resolution sufficient to identify TM helices of helix-bundle membrane proteins (typically 4 - 4.5 Å).
The latest mpstruc data is downloaded from http://blanco.biomol.uci.edu/mpstruc/listAll/mpstrucTblXml on a weekly basis.
These manual annotations are extended using our sequence clusters and according to the following procedure:
Single chain transmembrane proteins
Mpstruc is annotating transmembrane proteins on a per-PDB entry level. If the reference mpstruc entry contains only a single protein entity, this protein must be a tranmembrane protein. Therefore any PDB chain sharing 90% sequence identity to this transmembrane protein is assigned as a transmembrane protein as well, and shares the same transmembrane annotation.
Multi-chain transmembrane proteins
If the reference mpstruc entry contains multiple protein entities, it is necessary to identify which of the entities are presumed to be transmembrane chains. This is done in conjunction with Uniprot annotations. Transmembrane protein entities are identified by checking if their corresponding Uniprot sequence has annotations labeled transmembrane or intramembrane region. For transmembrane entities, all members of the sequence cluster (90% sequence identity) are programmatically infered to be members of the same class of transmembrane proteins by applying the above procedure for single entity mpstruc entries.