This repository contains a Python script to extract the top 1000 paired indexes from paired-end FASTQ files (uncompressed). The script reads the indexes from the provided FASTQ files, combines paired indexes, counts their frequencies, and outputs the top 1000 paired indexes along with their frequencies.
extract_top_1000_indexes.py
: The Python script to extract the top 1000 paired indexes.
- Python 3.x
- Biopython
To install the required Python library, run:
pip install biopython
Prepare your FASTQ files: Ensure you have your paired-end FASTQ files named Undetermined_S0_R1_001.fastq and Undetermined_S0_R2_001.fastq.
Place the FASTQ files: Place the FASTQ files in the same directory as the script or provide the correct path to them.
Run the script: Execute the script using Python:
python extract_top_1000_indexes.py
The script will print the top 1000 paired indexes along with their frequencies on the screen and write them to an output file named top_1000_paired_indexes.txt.
Example Output The output file top_1000_paired_indexes.txt will have the following format:
NAGTTCGGTA NCATGTGTAG: 150 NCTTAGTATA NCTTTCCCTA: 140 ...
If you use this script for your research, please consider citing it as follows: Sharma, V. (2024). extract_top_1000_indexes.py [Python script]. Retrieved from https://github.com/vsmicrogenomics/top_1000_indexes_from_fastq