The program searches database publication at NAR to find any database that uses PDB data. The program look for PDB keywords in the summary and abstract of each database published at NAR and measure PDB relation in three tiers:
- Tier 1 uses PDB data
- Tier 2 likely uses PDB data
- Tier 3 may use PDB data
The results searve as a guide for next-step manual review
The scripts are to be run in 3 steps:
Run to parse databases from NAR database summary, plus the category, the summary page etc. Give output file "p1_db_summary.tsv" of tabular form of databases' name, summary link, category, subcategory.
Run, look for PDB keywords. Give output file "p2_db_summary_review.tsv" of tabular form of db name, categories, db url, year, abstract url, length description, keywords t1/t2/t3.
Run, look for PDB keywords and test accessibility of the url of each database. Give output file "p3_db_abstract_review.tsv" of tabular form of db name, categories, db url, year, abstract url, length description, keywords t1/t2/t3, and database url accessibility.
Other python scripts such as those state with "p0" are utility scripts.