/summary_simulator

A tool to quickly simulate some dummy Nanopore sequence summary data for testing purposes

Primary LanguageRustMIT LicenseMIT

summary_simulator

Summary Simulator is a tool for generating synthetic data resembling features of Oxford Nanopore sequencing summary files. It aims to provide a subset of data useful for downstream testing purposes, namely my other tool summary_metrics, which quickly generates some useful statistics from sequencing summary text files.

Features

  • Random Data Generation: Generates random test data for these fields: read_id, passes_filtering, sequence_length_template, mean_qscore_template, and barcode_arrangement.
  • Configurability: Accepts command-line arguments to specify the q-score threshold, the most common barcode, and the number of rows of data to generate.
  • Output Format: Writes the generated test data to a file in tab-separated format for easy parsing and analysis.
  • Flexibility: Allows adjustment of parameters such as skewness and shift to fine-tune the distribution of generated data.

Installation

  • Clone the repository:
git clone https://github.com/your_username/summary_simulator.git
  • Navigate to the project directory:
cd summary_simulator
  • Build the project:
cargo build --release

You will find the binary at ./target/release/summary_simulator.

Usage

Run the tool from the command line, providing the necessary arguments:

./summary_simulator <q-score threshold> <most common barcode> <number of rows>

For help, use the following command:

./summary_simulator -h

Example

Generate 1000000 rows of test data with a q-score threshold of 9.0 and the most common barcode "barcode01":

./summary_simulator 9.0 barcode01 1000000

To Do

  • add proper help options
  • generate more columns for sequencing summary (if useful)
  • work on error handling
  • further performance optimisations

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

This tool utilizes the rand and rand_distr crates for random data generation.