/stress

maize cold/heat stress experiments

Primary LanguageRMIT LicenseMIT

This repository hosts code (primarily R and Python scripts) and data related to the cold/heat stress response project.

Directory structure:

  • README.md: (this file)
  • src/: data processing, statistical testing, visualization (R)
  • nf.degA/, nf.degB/, nf.dmodA, nf.dmodB, nfb1-4: nextflow pipeline configuration and results

Links to datasets:

Links to R / Python scripts:

  • cis_trans.R: Classify cis/trans inheritance pattern using inbred/hybrid expression, in basic mode or differential mode
      $ ./cis_trans.R -h
      usage: ./cis_trans.R [-h] [--mode MODE] [--min_rc MIN_RC] [--n_cpu N_CPU]
                           f_rc f_sf f_dsp fo
    
      Classify cis/trans inheritance pattern using inbred/hybrid RNA-Seq read counts
    
      positional arguments:
        f_rc             read count table
        f_sf             sample-wise size factor table
        f_dsp            gene-wise dispersion table
        fo               output file
    
      optional arguments:
        -h, --help       show this help message and exit
        --mode MODE      cis/trans test mode, "basic" for steady-state cis/trans
                         test and "diff" for control/treatment differential test
                         [default: basic]
        --min_rc MIN_RC  minimum read counts to filter low-expressed genes [default:
                         10]
        --n_cpu N_CPU    number of CPUs / threads to use for parallel processing
                         (for spped up if you have many genes) [default: 1]
    
  • kmer.py: kmer utilities, use kmer.py -h to find out more
usage: kmer.py [-h] {locate,prepare_ml,getfasta} ...

kmer utilities

optional arguments:
  -h, --help            show this help message and exit

available commands:
  {locate,prepare_ml,getfasta}
	locate              find given kmers in sequence database and report
						locations
	prepare_ml          locate given kmers for given IDs in sequence db using
						various filters and prepare output for ML
	getfasta            extract fasta for given IDs in sequence db using
						various filters
usage: fimo.py [-h] {locate,filter,bed2wide,prepare_ml} ...

fimo utilities

optional arguments:
  -h, --help            show this help message and exit

available commands:
  {locate,filter,bed2wide,prepare_ml}
	locate              run fimo to find given motifs in input sequences
	filter              filter BED file using window size / epigenetic marks
	bed2wide            convert BED file to machine learing tables
	prepare_ml          pipeline to find motifs and output in BED / ML input
						table
  • streme.py: wrapper around STRME from the meme-suite, output a meme file with found motifs and a tabular file with the exact motif locations
usage: streme.py [-h] {addscore,xml2tsv,pipe} ...

STREME utilities

optional arguments:
  -h, --help            show this help message and exit

available commands:
  {addscore,xml2tsv,pipe}
	addscore            add score_thresh to STREME output
	xml2tsv             convert STREME xml output to tsv
	pipe                run STREME pipeline
  • merge.fimo.R: read multiple FIMO outputs and save as a tibble in R
  • merge.dreme.kmer.R: read multiple meme outputs after running DREME/STRME and save as a tibble in R
  • merge.dreme.fimo.R: read multiple motif location outputs after running DREME/STRME and save as a tibble in R
  • merge.dreme.R: read multiple DREME outputs save as a tibble in R
usage: /home/springer/zhoux379/git/nf/bin/mmm/merge.dreme.R
       [-h] [-o output] [--meme meme] [--txt list] fi [fi ...]

merge dreme outputs

positional arguments:
  fi           dreme output file(s)

optional arguments:
  -h, --help   show this help message and exit
  -o output    output file [default: out.rds]
  --meme meme  merged motifs in meme format [default: out.meme]
  --txt list   motif ID list [default: out.txt]
  • ml_classification.R: train a machine learning model using RF/XGB/SVM algorithm, specifying holdout proportion, with down-sampling, using cross-validation, grid searching for hyperparameters in parallel, for detailed usage run ml_classification.R -h
usage: /home/springer/zhoux379/git/nf/bin/mmm/ml_classification.R
	   [-h] [--perm PERM] [--alg ALG] [--holdout HOLDOUT] [--fold FOLD]
	   [--fold_repeat FOLD_REPEAT] [--nlevel NLEVEL] [--downsample]
	   [--seed SEED] [--response RESPONSE] [--cpu CPU]
	   fi fo1

Run machine learning classification on given dataset

positional arguments:
  fi                    input dataset
  fo1                   output metrics file

optional arguments:
  -h, --help            show this help message and exit
  --perm PERM           number permutations [default: 1]
  --alg ALG             ML algorithm [default: rf]
  --holdout HOLDOUT     proportion data to hold out for test [default: 0.8]
  --fold FOLD           cv fold [default: 5]
  --fold_repeat FOLD_REPEAT
						repeat in each cv [default: 1]
  --nlevel NLEVEL       levels of hyperparameters to tune [default: 3]
  --downsample          downsample to get balanced [default: False]
  --seed SEED           random seed [default: 26]
  --response RESPONSE   response variable name [default: status]
  --cpu CPU             num. processors to use [default: 1]
  • ml_predict.R: read a trained model and predict outcome of a new dataset, for detailed usage run ml_predict.R -h
usage: /home/springer/zhoux379/git/nf/bin/mmm/ml_predict.R [-h] fm fi fo

Make predictions using trained model on given dataset

positional arguments:
  fm          (ML) model file
  fi          input dataset
  fo          output file to save predictions

optional arguments:
  -h, --help  show this help message and exit