Efficient Detection of Chromosomal Rearrangements Using a Scalable k-mer Database of Multiple Reference Genomes and Variations
ETCHING takes about 3 hours for WGS data with 30X normal and 50X tumor on 30 threads on DELL 930 server. You can also find codes, k-mer set, and DEMO files in our website.
http://big.hanyang.ac.kr/ETCHING/
The demo is complete within 10 min on a desktop (AMD Ryzen 7 3700X 8-Core Processor).
Debug to stop if etching_caller predicted no SV, or etching_sorter removed all SVs.
a. Debug to exit if etching_filter stopped abnormally, and to set -f properly in etching (line 793-4)
--target-filter and --miscall-kmer-cutoff options were added.
a. File names of final result modified
b. etching debug (line 882). Indentation error fixed (Sorter/scorer_XGBoost). README updated.
c. Virtual environment is implemented to solve dependencies. Simple installation guide.
Bug fixed (etching line 1283)
Debug etching
and etching_filter
They did not run properly when -o
option was not used.
See CHANGE.md for older updates.
- 64-bit LINUX with >=32GB RAM (at least >=16GB).
- Tested on Fedora workstation, Centos, and Ubuntu
-
Required to compile
- gcc, g++ (>=4.7.0), make, Python3 (3.6, 3.7, or 3.8), pyenv, wget
- python3-venv (Ubuntu/Debian/Mint)
-
Required to run
- BWA, samtools
We prepared a simple guide for CentOS/Fedora or Ubuntu/Debian/Mint users. You can skip this this step if all requirements were installed.
Note: We tested this guide on Fedora32/33/34, CentOS7/8, Ubuntu16.04/18.04/20.04, Mint19/20, Debian11, and MX linux.
# Required programs
sudo yum install -y epel-release # CentOS
sudo yum install -y gcc gcc-c++ make bwa samtools wget
## Required programs
sudo apt install -y gcc g++ make bwa samtools wget
# You can skip this if you will use pyenv.
# Unless, python3-venv should be installed.
sudo apt install -y python3-venv
# dependencies of pyenv
# For Fedora/CentOS
sudo yum install make gcc zlib-devel bzip2 bzip2-devel readline-devel sqlite sqlite-devel openssl-devel tk-devel libffi-devel xz-devel
# For Ubuntu/Debian/Mint
sudo apt-get update
sudo apt install make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev
# Install pyenv
curl https://pyenv.run | bash
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init -)"' >> ~/.bashrc
echo 'eval "$(pyenv init --path)"' >> ~/.bashrc
exec $SHELL
Once, requirements were solved, you can install ETCHING as follows.
# Download ETCHING
git clone --depth=1 https://github.com/ETCHING-team/ETCHING.git
# Move to /path/to/ETCHING
cd ETCHING
# Optional for pyenv users
pyenv install 3.7.12 # any version from 3.6.0 to 3.8.12
pyenv local 3.7.12
# Compile and install ETCHING
make
echo "export ETCHING_HOME=$PWD" >> ~/.bashrc
echo "export PATH=$PWD/bin:\$PATH" >> ~/.bashrc
exec $SHELL
As long as you keep /path/to/ETCHING/lib
, virtual environment automatically sets LD_LIBRARY_PATH
while running ETCHING.
# Change directory
cd /wherever/you/want/
# Download and decompress DEMO
wget http://big.hanyang.ac.kr/ETCHING/DEMO.tar.gz
tar zxvf DEMO.tar.gz
cd DEMO
# Run demo
etching -1 tumor_1.fq -2 tumor_2.fq -1c normal_1.fq -2c normal_2.fq -g small_genome.fa -a small_genome.gtf -f demo_PGK -o example -t 8
If you have no matched normal data, our pan-genome k-mer set (PGK) will be helpful to select tumor specific reads.
# Move to etching directory
cd /somewhere/you/want/
# Download
wget http://big.hanyang.ac.kr/ETCHING/PGK.tar.gz
# Decompress
tar zxvf PGK.tar.gz
# Then, you will see PGK_20200103.kmc_pre and PGK_20200103.kmc_suf in PGK:
# Here, PGK_20200103 is the name of k-mer set to be used for ETCHING.
ls PGK
Alternatively, you can make your own k-mer set as follows:
make_pgk -i reference.list -o my_pgk -v dbSNP.vcf -g hg19.fa
deactivate
Here, reference.list
is a file of file names of reference genomes in fasta format.
docker
# Download ETCHING docker image
wget http://big.hanyang.ac.kr/ETCHING/ETCHING_v1.3.7a.docker.saved.tar
# Load the image
docker load -i ETCHING_v1.3.7a.docker.saved.tar
# Check the image
docker images
You can see like this
REPOSITORY | TAG | IMAGE ID | CREATED | SIZE |
---|---|---|---|---|
etching | v1.3.7a | 63ffc48504f0 | 40 hours ago | 3.26GB |
Download our DEMO
# Download and decompress DEMO
wget http://big.hanyang.ac.kr/ETCHING/DEMO.tar.gz
tar zxvf DEMO.tar.gz
Run ETCHING with docker
docker run -i -t --rm -v /path/to/DEMO/:/work/ etching:1.3.7a etching -1 tumor_1.fq -2 tumor_2.fq -1c normal_1.fq -2c normal_2.fq -g small_genome.fa -a small_genome.gtf -f /work/demo_PGK -o example_1 -t 8
Here, etching:1.3.7a
is REPOSITORY
and TAG
of ETCHING docker image.
Replace /path/to/DEMO
with /your/data/path/
.
Note: Keep /work/
in the above command line.
Alternatively, you can run ETCHING inside docker container
docker run -i -t --rm -v /path/to/DEMO/:/work/ etching:1.3.7a /bin/bash
etching -1 tumor_1.fq -2 tumor_2.fq -1c normal_1.fq -2c normal_2.fq -g small_genome.fa -a small_genome.gtf -f /work/demo_PGK -o example_2 -t 8
Jang-il Sohn, Min-Hak Choi, Dohun Yi, A. Vipin Menon, and Jin-Wu Nam
Bioinformatic and Genomics Lab., Hanyang University, Seoul 04763, Korea
If you have any issues, please contact us
Jang-il Sohn (sohnjangil@gmail.com)
Jin-Wu Nam (jwnam@hanyang.ac.kr)