HVR Locator is a workflow to identify spanning hypervariable region(s) from amplicon sequencing variants or SRA public runs (SRR). It aligns query sequences to a reference E. coli full-length 16S rRNA gene and identifies the spanning region through alignment.
To install HVR Locator, follow these steps:
- Create a new conda environment:
mamba create --prefix /global/apps/hvreglocator/0.2 -y -c bioconda python=3.9 sra-tools mafft fastp biopython numpy scipy vsearch
- Activate the environment, clone the repository, and install the package:
source activate /global/apps/hvreglocator/0.2 && \
cd /global/apps/hvreglocator/0.2 && \
git clone https://github.com/fbcorrea/hvrlocator.git && \
cd hvrlocator && \
pip install -e .
Note: Replace the GitHub URL with the appropriate URL for your repository.
HVRegLocator can process both SRA accession numbers and FASTA files containing ASV sequences.
To process an SRA run:
hvreglocator sra -r SRR1585194
You can specify the location of the E. coli reference file if it's not in the default location:
hvreglocator sra -r SRR1585194 --ecoli /path/to/ecoli.fa
To process a FASTA file containing ASV sequences:
hvreglocator fasta -f path/to/your/asv_sequences.fasta
The script will output the alignment start and end positions, as well as the identified hypervariable region span (e.g., V1-V3).
hvreglocator.py
: The main script that handles both SRA and FASTA processing.setup.py
: Used for installing the package.
If you encounter any issues with finding the E. coli reference file, make sure it's in the same directory as the hvreglocator.py
script, or use the --ecoli
argument to specify its location.
Contributions to HVRegLocator are welcome. Please feel free to submit a Pull Request.
This project is licensed under the terms of the MIT license.