All the data used for downstream processing can be found here. please navigate the
subdirectories for associated data and python scripts used (if any) for processing
contains the data and python script used for mapping between fragment IDs,
ENSEMBL gene and protein IDs
contains cluster information from CD-HIT suite: cluster representatives, cluster members,
fastA sequences and python scripts used for processing the clusters
CD-HIT suite: http://weizhong-lab.ucsd.edu/cdhit-web-server/cgi-bin/index.cgi?cmd=cd-hit
contains the filtered PDB hits per fragemnts obtained from blast+ program
blast+: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
contains information on protein domains (Pfam,Gene3D), other meta data
with the python scripts used for processing the data and the interproscan used
information on Class, Architecture, families and the fraction of fragments in each class
files and the python script used for processing
Secondary stcutural information obtained from DSSP and the python script used for
processing the same
**sub-dir: Mann–Whitney-U**
contains the jupyter notebook used for making plots and mann-whitney U statistics
contains a short summary of data processing used and some supplementary plots