Xron (ˈkairɑn) is a methylation basecaller that could identify m6A methylation modification from ONT direct RNA sequencing.
Using a deep learning CNN+RNN+CTC structure to establish end-to-end basecalling for the nanopore sequencer.
The name is inherited from Chiron
Built with PyTorch and python 3.8+
If you found Xron useful, please consider to cite:
Teng, H., Stoiber, M., Bar-Joseph, Z. and Kingsford, C., 2024. Detecting m6A RNA modification from nanopore sequencing using a semi-supervised learning framework. bioRxiv, pp.2024-01.
If you encounter any issue during using Xron, please submit an issue in the repository.
m6A-aware RNA basecall one-liner:
xron call -i <input_fast5_folder> -o <output_folder> -m models/ENEYFT --boostnano
For either installation method, recommend to create a vritual environment first using conda or venv, take conda for example
conda create --name YOUR_VIRTUAL_ENVIRONMENT python=3.8
conda activate YOUR_VIRTUAL_ENVIRONMENT
Then you can install from our pypi repository or install the newest version from github repository.
pip install xron
Xron requires at least PyTorch 1.11.0 to be installed. If you have not yet installed PyTorch, install it via guide from official repository.
Before running basecall using Xron, you need to download the models from our AWS s3 bucket by running xron init
xron init
This will automatically download the models and put them into the models folder. We provided sample code in xron-samples folder to achieve m6A-aware basecall and identify m6A site. To run xron on raw fast5 files:
xron call -i ${INPUT_FAST5} -o ${OUTPUT} -m models/ENEYFT --fast5 --beam 50 --chunk_len 2000
Xron also include a non-homegeneous HMM (NHMM) for signal re-sqquigle. To use it: Firstly we need to extract the chunk and basecalled sequence using prepare module
xron prepare -i ${FAST5_FOLDER} -o ${CHUNK_FOLDER} --extract_seq --basecaller guppy --reference ${REFERENCE} --mode rna_meth --extract_kmer -k 5 --chunk_len 4000 --write_correction
Replace the FAST5_FOLDER, CHUNK_FOLDER and REFERENCE with your basecalled fast5 file folder, your output folder and the path to the reference genome fasta file.
Then run the NHMM to realign ("resquiggle") the signal.
xron relabel -i ${CHUNK_FOLDER} -m ${MODEL} --device $DEVICE
This will generate a paths.py file under CHUNK_FOLDER which gives the kmer segmentation of the chunks.
To train a new Xron model using your own dataset, you need to prepare your own training dataset, the dataset should includes a signal file (chunks.npy), labelled sequences (seqs.npy) and sequence length for each read (seq_lens.npy), and then run the xron supervised training module
xron train -i chunks.npy --seq seqs.npy --seq_len seq_lens.npy --model_folder OUTPUT_MODEL_FOLDER
Training Xron model from scratch is hard, I would recommend to fine-tune our model by specify --load flag, for example we can finetune the provided ENEYFT model (model trained using cross-linked ENE dataset and finetuned on Yeast dataset):
xron train -i chunks.npy --seq seqs.npy --seq_len seq_lens.npy --model_folder models/ENEYFT --load