Robaina/Pynteny

No output from example API

Closed this issue ยท 2 comments

wchow commented

Describe the bug
Hi I'm trying to run the example API (E. coli's K-12 MG1655) but it only returns an empty synteny_matched.tsv file

note: it seems like the pynteny version that was built was v1.0.0 using conda. Does v1.1.0 fix this issue?
update2: I have tried the docker image (https://github.com/Robaina/Pynteny/pkgs/container/pynteny) which contains v1.1.0 and it doesn't seem to have fixed the issue

To Reproduce
Steps to reproduce the behavior:

  1. mamba create -n pynteny -c bioconda -c conda-forge python=3.10 pynteny
  2. conda activate pynteny
  3. pynteny download --outdir pgap/hmms --unpack
  4. mkdir example_api
  5. wget https://github.com/Robaina/Pynteny/blob/main/tests/test_data/MG1655.gb
  6. Create api_example.py using code below
from pathlib import Path
from pandas import DataFrame
from pynteny.filter import SyntenyHits
from pynteny import Search, Build, Download


Build(
    data="MG1655.gb",
    outfile="labelled_MG1655.fasta",
    logfile=None
).run()

# Initialize class
search = Search(
    data="labelled_MG1655.fasta",
    synteny_struc="<leuD 0 <leuC 1 <leuA",
    hmm_dir=None,
    hmm_meta=None,
    outdir="example_api/",
    prefix="",
    hmmsearch_args=None,
    gene_ids=False,
    logfile="example_api/pynteny.log",
    processes=20,
    unordered=False,
    )

# Parse gene IDs in synteny structure according to PGAP HMM database metadata
parsed_struc = search.parse_genes(synteny_struc="<leuD 0 <leuC 1 <leuA")

    
# Update parsed synteny structure and Rrun Pynteny search
search.update("synteny_struc", parsed_struc)
synhits: SyntenyHits = search.run()

synhits_df: DataFrame = synhits.hits        

synhits_df.head()
  1. python ap_example.py

Expected behavior
Results in synteny structure tsv file

Screenshots

2023-09-05 14:05:13,670 | INFO: Building annotated peptide database
2023-09-05 14:05:14,061 | INFO: Parsing GenBank data.
2023-09-05 14:05:14,475 | INFO: Database built successfully!
2023-09-05 14:05:14,498 | INFO: Translated 
 "<leuD 0 <leuC 1 <leuA" 
 to 
 "<(TIGR00171.1|TIGR02084.1) 0 <(TIGR00170.1|TIGR02083.1) 1 <(TIGR00973.1|NF002084.0|TIGR00970.1)" 
 according to provided HMM database metadata
2023-09-05 14:05:14,555 | INFO: Searching database by synteny structure
2023-09-05 14:05:14,555 | INFO: Running Hmmer
2023-09-05 14:05:14,863 | INFO: Filtering results by synteny structure
2023-09-05 14:05:14,880 | INFO: Writing matching sequences to FASTA files
2023-09-05 14:05:14,880 | INFO: Finished!

Desktop (please complete the following information):

  • OS: Ubuntu 22:04

Additional context
I see the hmmsearch results but there are not results printed out. Its just an empty file with the headers ie:

contig  gene_id gene_number     locus   strand  full_label      hmm

I have also tried searching the other genes in the genbank file
for example "thrL 0 thrA" which are the first 2 gene annotations, and it also returns nothing.

Hi @wchow,

thanks for reporting this bug. I can reproduce it using the latest available version as well (v.1.1.1).

Will have a look in the following days.

So the latest Pynteny version v1.1.2. solves this issue (#92). It is still unavailable in Bioconda but can be accessed from the latest docker image: docker pull ghcr.io/robaina/pynteny:main

Please, let me know if anything goes wrong. Closing the issue for now.

Thanks again for spotting this!