python download_rna_id.py
Release date is later than 9/30/2021. At least one RNA is included in the structure.
python download_rna_info.py
Information about the RNAs are saved in the rna_info.json
file.
python download_cif.py
cif files rather than pdb file are downloaded because some RNAs have no pdb structure files.
python get_rna_only.py
Information about these RNA are saved in the rna_only.csv
file.
python cif2pdb.py rna_only.csv rna_only
Only the first model is retained. Chain names are reset from A-Z and then a-z. Residue numbers are reset from 1.
python annotate.py
python mc2db.py
First, collect all the secondary structure information.
for i in $(cat in); do echo $i; cat rna_only/${i}.db; echo; done >rna_only.db
Next, compile the rna_only.db
file by:
- Remove helices like
(((((((())))))))
- Remove duplicates
- Remove structures like
..........
The compiled dataset for RNA-only prediction is the file rna_only_compiled.db
.