/21_nickels

Computational Scripts for Transcriptional Pausing

Primary LanguageJupyter Notebook

DOI

Analysis code and summary data

Structural and mechanistic basis of σ-dependent transcriptional pausing

The computational pipeline used in this study was split into three steps. The first two steps run on the Elzar High Performance Computing Cluster at Cold Spring Harbor Laboratory (CSHL). The third (final step) is used to plot sequence logos and count profiles presented in the manuscript main text and supplements. The detailed descriptions of the three steps mentioned bellow are in separate README files in each folder.

  • The scripts for running this step are provided in 01_step_fastq_to_feature directory.
  • We explained how to run this step in the README file.
  • The inputs of this step is the raw fastq files and the output of this step are called feature files.
  • The scripts for running this step are provided in 02_step_feature_to_final_dataframe directory.
  • The step-by-step guideline for this step is provided in the README file.
  • The inputs of this step are features files from Step 1 and outputs are csv files in the pe_aloc_pairs and 2022_XACT_seq_data directory.
  • The scripts for running this step are provided in 03_step_final_dataframe_to_logos directory.
  • The separate README file is provided for detailed description of this step.
  • The inputs of this steps are the csv files from Step 2 and outputs are the sequence logos and rescaled counts figures which some of them are presented in the manuscript.

Please address technical questions about this repository and its contents to Justin B. Kinney. More general scientific correspondence about this work should be sent to Bryce Nickels. The sequencing data (fastq) files are available under SRA BioProject number SRP355098.