Batch Downloads with Shell Script

Multiple files can be downloaded in a Unix-like Operating System (e.g. MacOS or Linux) by using wget or curl tools.

To facilitate this, we provide a script that can download multiple PDB archive files by providing a file containing a comma-separated list of PDB ids. It requires that the curl tool is installed in your computer.

Such a list of PDB ids can be obtained by selecting "PDB IDs" in the "Tabular Report" drop-down available in any search results page. The "Download IDs" button will download a file containing a comma-separated list of PDB ids. Lists of PDB ids matching certain criteria can also be obtained programmatically with our Search API.

Please also see the file download services page for full PDB archive downloads and periodic synchronization of data.

Obtain the batch-download script

Usage

The ouput directory must exist prior to beginning the download.

Once downloaded, make sure the script has execution permission: chmod +x batch_download.sh

Obtain full help on the batch download shell script at the command line with: ./batch_download.sh -h

Structures Without Legacy PDB Format Files will not be included when the -p option is used.

Some examples

In the examples below it is assumed that the file list_file.txt is a plain text file that contains a comma separated list of PDB ids.

  • Download pdb.gz files:
    ./batch_download.sh -f list_file.txt -p
    Structures Without Legacy PDB Format Files will not be downloaded
  • Download cif.gz files:
    ./batch_download.sh -f list_file.txt -c
  • Download pdb.gz, cif.gz and sf.cif.gz files:
    ./batch_download.sh -f list_file.txt -c -p -s



Please report any encountered broken links to info@rcsb.org
Last updated: 12/1/2022