Summary Simulator is a tool for generating synthetic data resembling features of Oxford Nanopore sequencing summary files.
It aims to provide a subset of data useful for downstream testing purposes, namely my other tool summary_metrics
,
which quickly generates some useful statistics from sequencing summary text files.
- Random Data Generation: Generates random test data for these fields:
read_id
,passes_filtering
,sequence_length_template
,mean_qscore_template
, andbarcode_arrangement
. - Configurability: Accepts command-line arguments to specify the q-score threshold, the most common barcode, and the number of rows of data to generate.
- Output Format: Writes the generated test data to a file in tab-separated format for easy parsing and analysis.
- Flexibility: Allows adjustment of parameters such as skewness and shift to fine-tune the distribution of generated data.
- Clone the repository:
git clone https://github.com/your_username/summary_simulator.git
- Navigate to the project directory:
cd summary_simulator
- Build the project:
cargo build --release
You will find the binary at ./target/release/summary_simulator
.
Run the tool from the command line, providing the necessary arguments:
./summary_simulator <q-score threshold> <most common barcode> <number of rows>
For help, use the following command:
./summary_simulator -h
Generate 1000000 rows of test data with a q-score threshold of 9.0 and the most common barcode "barcode01":
./summary_simulator 9.0 barcode01 1000000
-
add proper help options - generate more columns for sequencing summary (if useful)
- work on error handling
- further performance optimisations
Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
This tool utilizes the rand and rand_distr crates for random data generation.