SECRiFY PDB processing

All the data used for downstream processing can be found here. please navigate the 
subdirectories for associated data and python scripts used (if any) for processing

Sub directories

Initial_ID_mapping

contains the data and python script used for mapping between fragment IDs,
ENSEMBL gene and protein IDs

cd-hit_fragments

contains cluster information from CD-HIT suite: cluster representatives, cluster members, 
fastA sequences and python scripts used for processing the clusters

CD-HIT suite: http://weizhong-lab.ucsd.edu/cdhit-web-server/cgi-bin/index.cgi?cmd=cd-hit

PDB_best3_hits

contains the filtered PDB hits per fragemnts obtained from blast+ program

blast+: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

Interproscan

contains information on protein domains (Pfam,Gene3D), other meta data 
with the python scripts used for processing the data and the interproscan used

CATH_analysis

information on Class, Architecture, families and the fraction of fragments in each class 
files and the python script used for processing

secondary structure

Secondary stcutural information obtained from DSSP and the python script used for 
processing the same

**sub-dir: Mann–Whitney-U**
contains the jupyter notebook used for making plots and mann-whitney U statistics

compiled_report_files

contains a short summary of data processing used and some supplementary plots

Pathmanaban/SECRiFY_PDB_processing

SECRiFY PDB processing

Sub directories

Initial_ID_mapping

cd-hit_fragments

PDB_best3_hits

Interproscan

CATH_analysis

secondary structure

compiled_report_files