Help Topics | Website FAQ | Glossary | Browser Check | Service Status | Contact Us

Protein Feature View: Overview

The Protein Feature View provides graphical summaries of full-length protein sequences from UniProtKB and how they relate to PDB entries. The Protein Feature View loads annotations from external databases such as Pfam and Phosphosite, domain annotations from SCOP and SCOPe, and regions for which homology models are available from the SWISS-MODEL Repository. There are also tracks available that display predicted regions of protein disorder (computed with JRONN) and hydrophobic regions, computed using a sliding window approach. Predicted disordered and hydrophobic regions are in red; ordered and hydrophilic regions are in blue.

For human proteins that can be mapped to the human genome, a track describes the projection of the protein structure onto the genome. ProteinFeatureView is currently available for all SwissProt entries (also those without PDB structures), as well as the small subset of TREMBL entries that can get linked to PDB.

By default, representative PDB entries are displayed to provide an overview of UniProtKB sequence regions that are included in PDB entries. The view can be expanded by pressing the "+" icon or by selecting the "Extended" menu option to show all available PDB entries (many, in some cases).

View An example Protein Feature View


Protein Feature Header Section

Learn more about the protein

The header section of the Protein Feature View displays information from UniProtKB about the function, catalytic activity and subunit structure (if available) of the sequence. The header section also contains an option to select Protein Feature Views from related organisms with the same gene name.

Protein Feature View for Ribulose bisphosphate carboxylase large chain (P23755). Other organisms with the same gene name can be selected from the menu. The number of available PDB structures is shown in gray circles.

The Action button has an option to map sequence motifs in the Protein Feature View as shown below.

Active site sequence motif Gx[DN]FxKxDE (Ribulose bisphosphate carboxylase large chain active site) mapped onto Protein Feature View (red box around mapped region). Note, X matches any amino acid, and [DN] matches either D or N. See Sequence Motif help page for details

Validation Track Information

Quality of Protein Structures

On Structure Summary pages, the Protein Feature View shows a track that provides a per-residue perspective of the wwPDB validation report.

The track uses color coding to indicate the number of bond angle outliers, bond length outliers, and clashes for a given residue.

  • Green: no outliers
  • Yellow: 1 outlier
  • Orange: 2 outliers
  • Red: 3 or more outliers
A red icon shows the presence of RSRZ outliers, which indicate poor fit to the electron density map.

Shown as an example below is one of the chains of PDB ID 4HHB, a structure of hemoglobin originally released in 1984. As modern refinement and validation tools were not available in 1984, the validation track is mostly orange and red due to the presence of a large number of geometric outliers.

4HHB validation track

By comparison, (see below) the validation track for one of the chains of the hemoglobin structure PDB ID 2W72, which was released in 2009, shows many fewer geometric outliers, although it does have several residues that are poorly fit into the electron density (RSRZ>2)

2w72 validation track

For more details on wwPDB validation reports please see the wwPDB website or read the article that describes the recommendations of the X-ray Validation Task Force.


What do all the tracks on the Protein Feature View represent?
Data origin/color codes
The vertical color bar to the left indicates data provenance.
Data in green originate from UniProtKB
Variation data (sourced from UniProt) shows non-genetic variation from the ExPASy   and dbSNP   websites.
Data in yellow originate from Pfam , via the HMMER3 web site
Data in purple originate from Phosphosite .
Data in orange originate from SCOP (version 1.75) and SCOPe (version 2.04) classifications.
Data in grey have been calculated using BioJava . Protein disorder predictions are based on JRONN (Troshin, P. and Barton, G. J. unpublished), a Java implementation of RONN
  • Red: potentially disordered region
  • Blue: likely ordered region.
Hydropathy has been calculated using a sliding window of 15 residues and a summing of scores from standard hydrophobicity tables.
  • Red: hydrophobic
  • Blue: hydrophilic
Data in lilac represent the genomic exon structure projected onto the UniProt sequence.
Data in blue originate from PDB
  • Secstruc: Secondary structure projected from representative PDB entries onto the UniProt sequence.
Sequence Mismatches It is now possible to view information about expression tags, cloning artifacts, and details related to sequence mismatches.
Icons represent a number of different sequence modifications that can be recorded in PDB files.
  • The 'T' icon T represents expression tags that have been added to the sequence.
  • The 'E' icon E represents an engineered mutation.
  • The '<>' icon <> represents microheterogeneity.
  • The 'CRO' icon CRO represents chromophores.
Besides these, there are many other icons. For more information about the meaning and exact position of a sequence modification, move the cursor over the icon.
Data in red indicate combined ranges of Homology Models from the SWISS-MODEL Repository