Parsing auxiliary data paths when running run_igblastn
chuckzzzz opened this issue · 2 comments
Description of the question
Hi! Thanks for developing this wonderful tool!
I am not sure if this is a bug so I will pose it as a question. I noticed that when formatting the command to pass to igblastn, the optional file directory does not have a "/" in front. Is this intentional? This is relevant to line 4503 and 4533 in _preprocessing.py
.
Following the changeO igblast setup, the igblast folder has a structure like this:
├── igblast
│ ├── database
│ ├── fasta
│ ├── internal_data
│ ├──optional_file
└──
The command to run is formatted like this
...
igdb + "/database/imgt_" + org + "_" + loci + "_j",
"-auxiliary_data",
igdb + "optional_file/" + org + "_gl.aux",
"-domain_system"
...
So if the structure holds, the optional file will not be located since it's not considered as a folder. Say my path to the igblast folder is ~/test/igblast
, then all the optional file will be formatted as ~/test/igblastoptional_file/human_gl.aux
while the other vdj files will be something like ~/test/igblast/database/imgt_human_ig_j
.
I might be missing something here, but let me know if this is supposed to be the way it is. Thanks!
Minimal example
No response
Any error message produced by the code above
No response
OS information
No response
Version information
No response
Additional context
No response
Hi @chuckzzzz thanks for this!
You are correct but it depends on how igdb
(which is basically $IGDATA
in my installation instructions) is exported/specified. Currently i've written in both the singularity container and the instructions for that to be IGDATA=/share/database/igblast/
so the /
is going to be there. It's doesn't seem to mind the double //
in e.g. igdb + "/database/imgt_" + org + "_" + loci + "_v"
so everything still works.
Ideally i should have just use pathlib
to handle this, so then there won't be any potential issues but just need to find the time to implement it properly.
something like:
import os
from pathlib import Path
env = os.environ.copy()
igdb = env["IGDATA"] if "IGDATA" in env else igblast_db
assert(Path(igdb).exists()
org = "human"
loci = "ig"
dbpath = Path(igdb) / "database"
imgt_org_loci = "imgt_" + org + "_" + loci + "_"
vpath = str(dbpath / (imgt_org_loci + "v"))
dpath = str(dbpath / (imgt_org_loci + "d"))
jpath = str(dbpath / (imgt_org_loci + "j"))
auxpath = str(Path(igdb) / "optional_file" / (org + "_gl.aux"))
cmd = [
"igblastn",
"-germline_db_V",
vpath,
"-germline_db_D",
dpath,
"-germline_db_J",
jpath,
"-auxiliary_data",
auxpath,
...,
]
gotcha thanks it makes sense!