cfChIP-seq: An R repository from lijiacd985

# cfChIP

This are the scripts to process cfChIP-seq results.

The files were run on R vs 4.0

List of packages needed: optparse, Biobase, MASS, Matrix, NMF, RColorBrewer, cba, cowplot, ctc, ggforce, ggplot2, preprocessCore, reshape2, rtracklayer, tools

There are two directory structures that are needed for the analysis to run. You can find download them from the Zenodo repository DOI:10.5281/zenodo.3967254

First, in the same directory as the scripts, there should be directory called SetupFiles. For each modification there should be a directory:

/SetupFiles/

that contains the following files:

Windows.rds - R file with catalogue of genomic windows GeneDescription.csv - description of genes FilterGenes.csv - list of genes to ignore in the analysis HealthyRef.csv - Reference of healthy samples (for normalizations and statistical tests) CommonGenes.rds - R file that contain list of genes used for normalization

Optional:

Meta-genes.bed - BED file of genes for meta gene plots Meta-enhancers.bed - BED file of enhancers for meta enhancer plots QC.bed - BED file of regions for QC

In addition, the script SetupFiles.R when run in the directory will create two additional files:

BackgroundModel.rds - R file with details of windows used in estimating backround levels Win2Gene.rds - R file with mapping of genes to specific windows

The second directory structure is used for results. The base of the structure is called and can be given in the command line.

The structure is then a directory tree with the names as follows:

/BED// - location of BED files (output of genomic alignment) /Samples// - location of R data structure per each sample /Tracks// - locaiton of bigwig tracks generated for each sample /Output// - location of analysis output

where is a modification name (as in the SetupFiles).

The main processing command is ProcessBEDFiles.R that can be run as

Rscript --vanilla /ProcessBEDFiles.R -r -m [options] [list of sample names]

Options include commands for the analysis. See

Rscript --vanilla /ProcessBEDFiles.R --help

In all cases, the procedure searches for the samples in this order:

/Samples//.rds (sample already processed by ProcessBEDFiles.R) /BED/.bed (bed file) /BED/.bed.gz (compressed bed file)

The Analysis file in the above repository is already populated with the BED files we used in our publication. Running do.sh will perform all analysis on all files (it will take a while). In addition doQC.sh will generate basic QC of the BED files.

Note: Before running edit do.sh to set SRCDIR to match you local installation.

lijiacd985/cfChIP-seq