/pol_ii_g

Architecture of Pol II(G) and molecular mechanism of transcription regulation by Gdown1

Primary LanguagePythonCreative Commons Attribution Share Alike 4.0 InternationalCC-BY-SA-4.0

DOI

Localizing gdown1 on POL II using integrative structure modeling

These scripts demonstrate the use of IMP and PMI in the modeling of the Pol II(G) complex using DSS chemical cross-links.

Input data (directory data):

40 DSS cross-links involving gdown1 were identified via mass spectrometry. 14 of these cross-links were intramolecular and 26 were intermolecular with Pol II. The Pol II structure used was obtained from the PDB (code 5FLM); it was determined primarily based on a cryo-EM density map at 3.4 Å resolution (EMDB code: 3218) Bernesky, 2016 Nature.

Representation of gdown1 relied on (i) secondary structure and disordered regions predicted by PSIPRED based on the gdown1 sequence (Buchan et al., 2013; Jones, 1999).

Running the simulations (directory production_scripts)

  • sample.py: PMI modeling scripts for running the production simulations: The search for good-scoring models relied on Gibbs sampling, based on the Metropolis Monte Carlo algorithm. We suggest producing at least 5 million models from 100 independent runs, each starting from a different initial conformation of gdown1 to have proper statistics.

  • submit.sub: SGE cluster based submission scripts to run automatically 100 independent runs.

  • The compressed 100 independent trajectories are accessible at: /salilab/park3/ganesans/gdown1_trajectories/gdown1.tar.gz

Analyzing the simulations (directory analysis_scripts)

Various scripts to analysis the simulations. We give more details scripts that allows us to test for sampling convergence.

  • select_good_scoring_models.py: Python script that reads and selects RMF files based on good scoring model criteria explained in the methods section.

  • random_subsets_best_score_convergence.py: we test if adding more models improves our sampling of top scores. The input to the script is a list of all the Total_Score from the simulations. For each subset, we perform 100 sub-samplings to compute error bars. We give the file twice to check how the error bars varies.

  • Kolmogorov_Smirnov_2Samples.py: we test if the distribution from two independent subsets are not unsimilar.

  • The good-scoring models that have been selected for precision-based clustering based on RMSD metric are located in: results/Models. The file name format is ${trajectory_number}_${frame_number}.rmf3

  • sample_precision.py: Given a RMSD matrix (results/Clustering/rmsd_matrix.tar.gz), we compute the 𝛘2-test for homogeneity of proportions{McDonald, 2014} and compute the best precision for which sampling has converged.

  • rmsf.py: Given a list of structures, compute the average RMSF, which indicates the precision of the structures in a cluster.

Plotting the results (directory plotting_scripts)

Python scripts for generating figures from the paper.

Information

Author(s): Sai J. Ganesan

Date: September 6th, 2018

License: CC BY-SA 4.0 This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.

Last known good IMP version: build info build info

Testable: Yes

Parallelizeable: Yes

Publications: Miki Jishage, Xiaodi Yu, Yi Shi, Sai J. Ganesan, Andrej Sali, Brian T. Chait, Francisco Asturias, and Robert G. Roeder Architecture of Pol II(G) and molecular mechanism of transcription regulation by Gdown1 Nat Struct Mol Biol. (2018) 25(9):859-867. doi: https://doi.org/10.1038/s41594-018-0118-5