Many to Many vs. One to Many
Closed this issue · 3 comments
This is a feature idea, rather than an issue.
The current version is a great tool, and nearly fits a need that I have. For my use case, rather than a many to many query/comparison/report, I would like to query one (or few) sample(s) vs. a potentially large database of other samples.
Something as follows:
STR_db
Sample Name marker1 marker2 marker4 Penta D Penta E AMEL
SampleA 12, 14 12 13,13 9,10 12,14 X
SampleB 12, 14 11.3, 12 13,15 9,10 12,14 X
...
SampleZZ ...
STR_QUERY
Sample Name marker1 marker2 marker4 Penta D Penta E AMEL
SAMPLE_42 12, 14 12 13,13 9,10 12,14 X
SAMPLE_101 12, 14 11.3, 12 13,15 9,10 12,14 X
strprofiler -sm "SampleMap_exp.csv" -scol "Sample Name" -db STR_db -o ./strprofiler_output STR_QUERY
Output would be just the 2 files from the query + summary csv and summary html:
SAMPLE_42.strprofiler.....csv
SAMPLE_101.strprofiler....csv
Having this feature avoids the many to many comparison of database samples that are not required. I am looking for this because the database can potentially be quite large.
Is the a feature you would be interested in adding? I am planning to fork the repo and see about adding it, but if this would be quick and easy for you, I would defer to you.
An interesting idea. For our needs, I tend to just re-run the many to many each time we get a new sample since it's not computationally expensive and only takes a few seconds. Then again, we only have a few hundred samples, so if you were dealing with 10s of thousands or such, I can see it being more of a hassle.
I'd welcome a PR, but I'm unlikely to add the feature myself honestly. My dev time is stretched thin as it is, and I have other pressing projects.
I will see what I can do, and push a PR. I will likely need guidance on what you want/need for unit tests etc. We can cross bridge that in the PR.
Thanks again, this is now included in the v0.1.4 release on pypi.