182 words

Download and Sync with the Entire PDB Database

All the PDB files are available via PDB’s FTP server. Simply cd to a directory and use wget to download all of them.

mkdir pdb; cd pdb; mkdir zipped unzipped; cd zipped
wget ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/*

Depending on the traffic, The first download will take 2~4 days. After the first download, to sync the local database with the FTP server, add the -N option to wget (essentially this means downloading “new” files only).

The *.ent.gz files downloaded are zipped. You can use gunzip to decompress them to ready-to-use *.ent files (which is written in PDB format), but that will remove the *.gz files, which is inconvenient if you want to keep your database in sync with the remote. Use gzcat (or equivalently gunzip -c) instead to preserve the *.gz files while decompressing:

find . -name "*.gz" | xargs -I{} -n1 bash -c '
src={}; dst=../unzipped/$(basename ${src%*.gz})
[ -f $dst ] && echo "$dst already exists, skipping..." || ( gzcat -cv $src > $dst )'

You can monitor the progress from the verbose output of gzcat:

./pdb1byf.ent.gz:      78.1%
./pdb1byg.ent.gz:      76.3%
./pdb1byh.ent.gz:      77.3%
./pdb1byi.ent.gz:      75.2%
...