/AECT

A deep learning based pipeline for TSS coverage denoising and feature extraction from shallow cell free DNA sequencing

Primary LanguagePython

A deep learning based pipeline for TSS coverage denoising and feature extraction from shallow cell free DNA sequencing

AECT (Autoencoder on Cell-free DNA TSS coverage profile) is an autoencoder based method to denoise the TSS coverage profiles generated by shallow cfDNA sequencing. A set of pre-processing steps on cfDNA sequencing data, including GC bias adjustment, copy number variation normalization were also integrated. AECT improves robustness of TSS coverage quantification, and improved sensitivity and specificity of TSS profile based classifier.

Installation

install from GitHub

Using setuptools:
git clone git://github.com/hanbw0120/AECT
cd AECT
python setup.py install

Alternatively, you can install all dependent packages and directly run the script as "python AECT.py <parameters>":
conda install numpy pandas keras
conda install tensorflow=1.14.0

Quick Start

Usage

AECT.py -i <input_data> -f [file_type] -o [output_data]

Input

Other options

  • size of the hidden layers, separated with commas, default is 128,64,32,64,128: [--hidden_size]
  • batch size, default is 32: [-b] or [--batch_size]
  • epochs fr training, default is 500: [-e] or [epochs] change iterations by watching the convergence of loss, default is 30000: [-i] or [--max_iter]
  • reduces learning rate if validation loss does not improve in a given number of epochs, default is 10: [--reduce_lr]
  • stops training if validation loss does not improve in a given number of epochs, default is 15: [--early_stop]

Run use Docker image

  • a Docker file was upload to: https://pan.baidu.com/s/1IhAQdhgun67PQAhN7-eIYg (Extracted code: aect)
  • load the Docker file:
    gunzip -c AECT_v0.9.tar.gz | docker load
  • run AECT:
    docker run --ipc=host -v "/PATH/TO/DATA/":"/PATH/TO/DATA/" aect:v0.9 AECT.py -i /PATH/TO/DATA/example/example_data.csv -o out.csv

Other Scripts

GC adjustment

Usage: python run_gc_adj.py -b -d -g

CNA normalization

Usage: python cnv_norm.py

  • input cnv is generated by DNAcopy package