zktuong/dandelion

Parsing auxiliary data paths when running run_igblastn

chuckzzzz opened this issue · 2 comments

Description of the question

Hi! Thanks for developing this wonderful tool!

I am not sure if this is a bug so I will pose it as a question. I noticed that when formatting the command to pass to igblastn, the optional file directory does not have a "/" in front. Is this intentional? This is relevant to line 4503 and 4533 in _preprocessing.py.

Following the changeO igblast setup, the igblast folder has a structure like this:

├── igblast
│   ├── database
│   ├── fasta
│   ├── internal_data
│   ├──optional_file
└──

The command to run is formatted like this

...
igdb + "/database/imgt_" + org + "_" + loci + "_j",
"-auxiliary_data",
igdb + "optional_file/" + org + "_gl.aux",
"-domain_system"
...

So if the structure holds, the optional file will not be located since it's not considered as a folder. Say my path to the igblast folder is ~/test/igblast, then all the optional file will be formatted as ~/test/igblastoptional_file/human_gl.aux while the other vdj files will be something like ~/test/igblast/database/imgt_human_ig_j.

I might be missing something here, but let me know if this is supposed to be the way it is. Thanks!

Minimal example

No response

Any error message produced by the code above

No response

OS information

No response

Version information

No response

Additional context

No response

Hi @chuckzzzz thanks for this!

You are correct but it depends on how igdb (which is basically $IGDATA in my installation instructions) is exported/specified. Currently i've written in both the singularity container and the instructions for that to be IGDATA=/share/database/igblast/ so the / is going to be there. It's doesn't seem to mind the double // in e.g. igdb + "/database/imgt_" + org + "_" + loci + "_v" so everything still works.

Ideally i should have just use pathlib to handle this, so then there won't be any potential issues but just need to find the time to implement it properly.

something like:

import os
from pathlib import Path

env = os.environ.copy()
igdb = env["IGDATA"] if "IGDATA" in env else igblast_db
assert(Path(igdb).exists()

org = "human"
loci = "ig"
dbpath = Path(igdb) / "database"
imgt_org_loci = "imgt_" + org + "_" + loci + "_"
vpath = str(dbpath / (imgt_org_loci + "v"))
dpath = str(dbpath / (imgt_org_loci + "d"))
jpath = str(dbpath / (imgt_org_loci + "j"))
auxpath = str(Path(igdb) / "optional_file" / (org + "_gl.aux"))


cmd = [
    "igblastn",
    "-germline_db_V",
    vpath,
    "-germline_db_D",
    dpath,
    "-germline_db_J",
    jpath,
    "-auxiliary_data",
    auxpath,
    ...,
]

gotcha thanks it makes sense!