/_2023_Tesei_IDRome

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Colab Colab DOI:10.1101/2023.05.08.539815 DOI:10.1038/s41586-023-07004-5 Video DOI

Conformational ensembles of the human IDRome

This repository contains Python code, Jupyter Notebooks, and data for reproducing the results presented in the manuscript Conformational ensembles of the human intrinsically disordered proteome.

The CSV file IDRome_DB.csv lists amino acid sequences, sequence features, and conformational properties of all the 28,058 IDRs.

Simulation trajectories and time series of conformational properties are available for all the IDRs at sid.erda.dk/sharelink/AVZAJvJnCO.

We also provide Notebooks on Google Colab to (i) generate conformational ensembles of user-supplied sequences using the CALVADOS model and (ii) predict scaling exponents and conformational entropies per residue using the SVR models:

Video

Layout

  • seq_conf_prop.ipynb reproduces Fig. 1, 3, and Extended Data Fig. 2, 5, 6e-t, and 7
  • go_analysis.ipynb reproduces Fig. 2
  • conservation_analysis.ipynb reproduces Fig. 4
  • clinvar_fmug.ipynb reproduces Fig. 5 and Extended Data Fig. 9
  • uniprot_domains.ipynb reproduces Extended Data Fig. 1
  • svr_models.ipynb reproduces Extended Data Fig. 8
  • go_uniprot_calls.ipynb performs API calls to obtain gene ontology terms from UniProt
  • calc_seq_prop.ipynb and calc_seq_prop_SPOT.ipynb compute sequence descriptors and generate the IDRome_DB.csv and IDRome_DB_SPOT.csv files
  • CALVADOS_tests.ipynb reproduces Extended Data Fig. 3
  • AF2_PAEs.ipynb reproduces Extended Data Fig. 4
  • CD-CODE.ipynb reproduces Extended Data Fig. 6a-d
  • md_simulations/ contains code and data related to single-chain simulations performed using the CALVADOS model and HOOMD-blue v2.9.3 installed with mphowardlab/azplugins
  • idr_selection/ contains code and data to generate the pLDDT-based and SPOT-based sets of IDRs
  • idr_orthologs/ contains code and data to generate the set of orthologs of human IDRs
  • svr_models/ contains scikit-learn SVR models generated in svr_models.ipynb
  • zscores/ contains code and data to calculate NARDINI z-scores
  • go_analyses/ contains input and output data related to the Gene Ontology analyses in go_analysis.ipynb
  • QCDPred/ contains code and data related to QCD calculations
  • clinvar_fmug_cdcode/ contains code and data related to the analysis of the ClinVar, FMUG, and CD-CODE databases

Usage

To open the Notebooks, install Miniconda and make sure all required packages are installed by issuing the following terminal commands

    conda env create -f environment.yml
    source activate idrome
    jupyter-notebook

Commands to install HOOMD-blue v2.9.3 with mphowardlab/azplugins v0.11.0

    curl -LO https://github.com/glotzerlab/hoomd-blue/releases/download/v2.9.3/hoomd-v2.9.3.tar.gz
    tar xvfz hoomd-v2.9.3.tar.gz
    git clone https://github.com/mphowardlab/azplugins.git
    cd azplugins
    git checkout tags/v0.11.0
    cd ..
    cd hoomd-v2.9.3
    mkdir build
    cd build
    cmake ../ -DCMAKE_INSTALL_PREFIX=<path to python> \
        -DENABLE_CUDA=ON -DENABLE_MPI=ON -DSINGLE_PRECISION=ON -DENABLE_TBB=OFF \
        -DCMAKE_CXX_COMPILER=<path to g++> -DCMAKE_C_COMPILER=<path to gcc>
    make -j4
    cd ../hoomd
    ln -s ../../azplugins/azplugins azplugins
    cd ../build && make install -j4