IEDB/TCRMatch

Unify the process output python + tcrmatch.cpp files

Closed this issue · 5 comments

Per conversation in #25 , unify the two scripts into a single tool and increase the AIRR format support

I'm wondering if the simplest way to do this is to rework the process_output.py script to basically use it as a wrapper which calls the tcrmatch executable, takes the results, and outputs all the metadata with one call. We can benchmark this, but are you aware of performance issues with running the executable on the command line vs. running subprocess.run or os.system within a Python script?

Maybe. I'm not sure if tcrmatch might need to be run specially on some systems because it uses openmp? I'm thinking about how MPI programs often need to be run with mpirun. One idea is to have a simple shell script that runs tcrmatch then runs the python script.

I'm personally fine with the two step process; we get this all the time in our analysis workflows where additional processing needs to be done on output files. One thing you want to prevent is re-running tcrmatch if the user wants 2 or more different output formats.

At the same time, while python handles text processing simply, C++ isn't that much harder...

@acrinklaw @schristley Yeah if we could just produce the same output using C++, that would be best.

I'm not as skilled with C++, but we could get it working.

I can take a stab at it too if you can describe the outputs you want.

Basically the same as the outputs processed in the Python script, just without the need for it.