Attribute Search

Overview

What are Attributes?

Attributes are properties of a 3D structure that have specific text or numerical values, that can be used to identify one structure or a group of structures for exploration and analysis. The attributes available for searching include:

  • information about the entry, e.g., who solved the structure, when, and by what method; where the structure was published, the names and types of molecules present in the structure, experimental details, and annotations
  • properties of small molecules, ligands, drugs, and polymer building blocks (or residues) such as amino acids, and nucleotides
  • information about the experiment performed

Why use Attribute Search?

The Attribute Search on RCSB.org allows searching in specific attributes such as Structure Title, Release Date, Source Organism Taxonomy Name, etc. Limiting your search to specific attributes can yield more precise results. For instance, if you are looking for structures from a particular author, it is more efficient to limit your search to the Structure Author attribute. If your attribute search retrieves too many matches, you can construct complex queries to retrieve more manageable results. Complex queries can be constructed by using several attributes together, combining them with Boolean operators “AND”, “OR”, and “NOT”.

Documentation

Interface Description

Advanced Search Query Builder provides two options to perform Attribute Search (Figure 1):

Figure 1: Types of Attribute Search from the Advanced Search Query Builder
Figure 1: Types of Attribute Search from the Advanced Search Query Builder
  • Structure Attribute option allows searching through annotations that describe biological macromolecules defined in the PDB Exchange Dictionary (PDBx/mmCIF).
  • Chemical Attribute option allows searching through chemical reference data that describe small molecules defined in the Chemical Component Dictionary (CCD) and peptide-like molecules provided by the Biologically Interesting Molecule Reference Dictionary (BIRD).

By default, Advanced Search Query Builder opens Structure Attribute option and shows other options collapsed (Figure 2):

Figure 2: Default view of Attribute search options from the Advanced Search Query Builder.
Figure 2: Default view of Attribute search options from the Advanced Search Query Builder.

You can select an attribute from the pull-down menu and fill in suitable keywords, values, ranges, etc. for that field. Learn more about Types of Attributes that are available in search menus.

See an example of the attribute search for the small molecule/ligand named "chlorophyll" (Figure 3). See also other Examples.

Figure 3: Shown here is the interface after the attribute “Chemical Name” is chosen from the pull-down menu and “chlorophyll” typed as a search value.
Figure 3: Shown here is the interface after the attribute “Chemical Name” is chosen from the pull-down menu and “chlorophyll” typed as a search value.
  1. X - this will clear the attribute box
  2. Down arrow - gives an abbreviated menu of attribute categories, and full lists of attributes are opened using the down arrow in the list
  3. Double down arrow - gives an expanded menu of attributes with no need to open categories
  4. Qualifier window - drop down menus give choices for qualifiers, such as “has exact phrase”
  5. Attribute text box - values for the attribute are input here. In some cases, a drop-down menu of choices is available
  6. + NOT - changes the attribute to a Boolean NOT
  7. Count - gives a preview of the number of 3D structures corresponding to that attribute
  8. X - this will remove the entire attribute
  9. Buttons at the bottom of the Attribute search allow construction of complex composite queries, with new attributes and subqueries related by AND or OR Boolean operators

A query can be made more specific by including additional attributes and combining them with Boolean Operators.

“Attributes” and "Subqueries" can be used to define complex logical expressions with AND/OR conditions. The query can include one or more set of attributes, which is similar to writing a search expression “in parentheses”, such that they are evaluated first while determining the query outcome.

You can execute partial queries for each attribute by clicking on the "Count" button next to the search expression.

To run the search, click on the magnifying glass icon in the bottom right corner of the Advanced Search Query Builder (Figure 4).

Figure 4: Buttons to run query count, clear it, and run the full query.
Figure 4: Buttons to run query count, clear it, and run the full query.

Types of Structure Attributes

Attributes or properties available for searching the archive are grouped according to entry, entity, instance, assembly, experimental details, and annotations. These are briefly summarized here and described in detail below. See Attribute Details section for all available attributes.

Identifiers and Keywords

These attributes include identifiers assigned to the PDB structures or CSMs (entry), experimental maps (EMDB), macromolecules included in the structure (entities such as proteins or nucleic acids), and related keywords.

Entry-related Attributes

These attributes focus on properties of the entry (experimental structure) and include

  • summary information about entry deposition (such as titles, authors, affiliations, and dates)
  • entry composition (types and numbers of protein, DNA, and RNA macromolecules, molecular weights, and number of non-polymer entities in the entry)
  • primary citations describing the entry, including citation information, abstract, and common identifiers, e.g., DOI
  • attributes related to all citations that reference the entry

The Structure Determination Methodology attribute allows users to query experimental structures only, CSMs only, or both.

Computed Structure Model Attributes

These attributes pertain to computed structure model structures alone and include

  • CSM entry identifier
  • Source Database for CSMs
  • The global quality score (pLDDT)

Entity-related Attributes

These attributes focus on properties of polymeric and non-polymeric molecules present in the entry. They include details about

  • macromolecules (proteins or nucleic acids) including names, types, and length of polymers, mutations and modifications, organism taxonomies and enzyme
  • classifications, and information on membrane association
  • non-polymer small molecules including component ID for molecule of interest and binding properties
  • oligosaccharide (or branched polymer) details including structural features and components

Instance-related Attributes

These attributes focus on annotations of each instance of polymeric entities, e.g., SCOP and CATH classifications.

Assembly-related Attributes

These attributes are related to biological assemblies, including size and composition of the assembly, experimental support for assembly assignment, and assembly symmetry.

Note: If you are expecting to see all assemblies that match your query, remember to change the Return option to "Assemblies".

Sample, Experiment, and Method-related Attributes

These attributes can be used to design queries based on the structure determination and include details about

  • experimental method types, including overall resolution, and software employed
  • properties of crystals used for structure determination, including unit cell dimensions, space group, crystallization method(s)
  • experiment-specific details for:
    • X-ray crystallography, including attributes related to refinement, B-values, and R-values
    • NMR data collection and refinement
    • EM data collection and refinement

Types of Chemical Attributes

Small Molecule and BIRD Molecule Reference Data

These attributes enable queries based on the presence of specific chemical components and/or larger Biologically Interesting Molecules (or BIRD Molecules such as peptide-like inhibitors and antibiotic molecules) in the PDB. The attributes include chemical names and identifiers, atom counts, and molecular weights. These are useful for searching components that are parts of polymers and oligosaccharides (amino acids, nucleotides, saccharides etc.) as well as non-covalently bound ligands, inhibitors, and drugs.

Types of Operators and how to use them

Queries can be constructed by assigning values or ranges for selected attributes and combining them with suitable Boolean operators. Depending on the type of attribute, one can use different operators to create search conditions. Below are all the possible Operators for the different types.

Numerical and date attributes operators

Numerical values of these attributes can be used to identify 3D structures with values equal to (=), less/more than () a specified number. For some attributes ranges of values can be assigned, or the query can be used to identify structures with any value assigned to that attribute. These operators can be selected from the following list:

  • =, >, >=, <, <= : standard mathematical operators
  • range (upper excl.): range of numerical values for an attribute (from a lower bound to an upper bound), excluding the upper bound
  • range (upper incl.): range of numerical values for an attribute (from a lower bound to an upper bound), including the upper bound
  • is not empty: match for all entries that have any numerical data for the corresponding attribute. Entries with no data for the attribute are not matched. Note: this operator does not take any input value
  • last 7 days: a relative date search for the last 1 week (from the date of the query). It allows you to create searches that can be run periodically without needing to alter the query. Note: this operator does not take any input value
  • last 30 days: a relative date search for the last 1 month (from the date of the query). It allows you to create searches that can be run periodically without needing to alter the query. Note: this operator does not take any input value

Exact match text attributes operators

Text-based attributes may be of two types - ones that exactly match a given vocabulary list and those that are free form. Attributes of the former type can be included in queries as a specific word/phrase, a list of words/phrases, or simply based on whether or not the entry includes something in that attribute. The operators can be selected from the following list:

  • is: the attribute must match the given text value exactly.
  • is any of: the attribute must match any of the given values in a comma-separated list. The list can also be a single value.
  • is not empty: this operator does not take an argument. It matches entries that have any text data for the corresponding attribute.

Free text attributes operators

Some text-based attributes are free-format, i.e., they do not use a specific vocabulary but may include specific words or phrases. Queries may also be designed to identify structures with any content in these attributes:

  • has exact phrase: the attribute's text must contain the given phrase
  • has any of words: the attribute's text must contain all the given words, in any order
  • is not empty: this operator does not take an argument. It matches entries that have any text data for the corresponding attribute

Examples

With the attribute search, users can do simple searches on many aspects of entries in the archive. For example, properties of the molecules, as well as information about who, where, and how the structures were determined can be used for searches. Users can also create composite queries by combining several types of search attributes to refine or focus on entries that fit their current needs. Here are a few examples.

Some Simple Search Examples:

S1. Find all experimental structures that include a chlorophyll molecule

Figure 5: Advanced Search Query Builder options to search for structures with chlorophyll molecules.
Figure 5: Advanced Search Query Builder options to search for structures with chlorophyll molecules.
  1. In the Chemical Attributes section of the Query builder, choose “Chemical Components: Chemical Name” and type "chlorophyll" in the box
  2. Click on the magnifying glass to get several hundred proteins that include types of chlorophyll molecules (See Figure 5).

Note: As of August 2022, CSMs avialable from the RCSB.org do not include any ligands so this search is run with the default option of excluding CSMs.

S2. Find myoglobin structures

Run query (experimental structures only). This uses the attribute “Polymer Molecular Features: Macromolecule Name” to search in experimental structures only

Run query (experimental structures and CSMs). This uses the attribute “Polymer Molecular Features: Macromolecule Name” to search in experimental structures and CSMs.

S3. Find structures related to UniProt entry P02185 (sperm whale myoglobin)

Run query (experimental structures only). This uses the attribute “IDs and Keywords: Accession Code(s) - UniProt”

Run query (experimental structures and CSMs). This uses the attribute “IDs and Keywords: Accession Code(s) - UniProt”

S4. Find structures deposited by John Kendrew

Run query. This uses the attribute “Deposition: Structure Author”

S5. Find latest structures

PDB structures are updated each week on Wednesday 00:00 UTC (Coordinated Universal Time). Run query. This query uses the attribute "Release Date" to find latest structures

S6. Find structures of the antibiotic inactivating enzyme FosA

PDB structures of FosA enzyme can be identified using the attribute search option (See Figure 6). In the search attribute options select CARD and specify FosA using the annotation ARO:3000149. Run query.

Figure 6: Search by Attribute using CARD annotations (ARO:3000149)
Figure 6: Search by Attribute using CARD annotations (ARO:3000149)

Some Composite Query examples:

C1. Find structures that include chlorophyll and were determined using electron microscopy

Figure 7: Advanced Search Query Builder options to search for electron microscopy structures with chlorophyll molecules.
Figure 7: Advanced Search Query Builder options to search for electron microscopy structures with chlorophyll molecules.
  1. Choose from Structure Attribute: “Methods" -> "Experimental Methods”. Click on the little arrow next to the box, and select “ELECTRON MICROSCOPY”.
  2. Choose from Chemical Attribute: “Chemical Components" -> "Chemical Name”, with “chlorophyll”
  3. Click on the magnifying glass to get several dozen structures of proteins determined using electron microscopy that include chlorophyll molecules

Note: Since one of the query parameters specifies the experimental method as electron microscopy, use the default option and exclude CSMs from the search.

C2. Find entries with four or more protein chains and at least one disulfide linkage

Run query. Uses attributes: “Assembly Features: Number of Protein Instances (Chains) per Assembly” AND “Deposited Entry Features: Disulfide Bond Count per Deposited Model”

C3. Find entries from 2020 that don’t include DNA

Run query. Uses attributes: “Deposition: Deposit Date” AND “Polymer Molecular Features: Polymer Entity Type”, with the NOT selected

C4. Find entries from SARS-CoV-2 that include glycosylation

Run query. Uses attributes: “Polymer Molecular Features: Source Organism Taxonomy Name” AND “Oligosaccharide/Branched Molecular Features: Oligosaccharide Component Count”

C5. Find entries of human or mouse hydrolases, that are in complexes with Nucleic Acids, were determined using Electron Microscopy, and have a resolution better than 5Å

Run query. Uses attributes Enzyme Classification Number, Source Organism Taxonomy Name, Polymer Types and more. The query can also be written as follows:

QUERY: Enzyme Classification Number = "3" AND Entry Polymer Types = "Protein/NA" AND Experimental Method = "ELECTRON MICROSCOPY" AND Reconstruction Resolution < 5 AND Source Organism Taxonomy Name IN (Homo sapiens, Mus musculus)

C6. Find NMR structures of homomeric or heteromeric assemblies

Run query for homomeric NMR structures uses attributes: QUERY: Experimental Method (Broader Categories) = "NMR" AND Number of Distinct Protein Entities = 1 AND Number of Protein Instances (Chains) per Assembly > 1
Run query for heteromeric NMR structures uses attributes QUERY: Experimental Method (Broader Categories) = "NMR" AND Number of Distinct Protein Entities > 1

For Advanced Users

Advanced users who wish to use web-services to run searches can learn more about the data structure (https://data.rcsb.org/#examples) and get search instructions and examples (https://search.rcsb.org/#examples). Further details on Web Services and APIs are available at Web Services Overview.



Please report any encountered broken links to info@rcsb.org
Last updated: 1/17/2024