j-andrews7/STRprofiler

Many to Many vs. One to Many

Closed this issue · 3 comments

This is a feature idea, rather than an issue.

The current version is a great tool, and nearly fits a need that I have. For my use case, rather than a many to many query/comparison/report, I would like to query one (or few) sample(s) vs. a potentially large database of other samples.

Something as follows:

STR_db

Sample Name	marker1	marker2	marker4	Penta D	Penta E	AMEL
SampleA	12, 14	12	13,13	9,10	12,14	X
SampleB	12, 14	11.3, 12	13,15	9,10	12,14	X
...
SampleZZ ...

STR_QUERY

Sample Name	marker1	marker2	marker4	Penta D	Penta E	AMEL
SAMPLE_42	12, 14	12	13,13	9,10	12,14	X
SAMPLE_101	12, 14	11.3, 12	13,15	9,10	12,14	X
strprofiler -sm "SampleMap_exp.csv" -scol "Sample Name" -db STR_db -o ./strprofiler_output STR_QUERY

Output would be just the 2 files from the query + summary csv and summary html:

SAMPLE_42.strprofiler.....csv
SAMPLE_101.strprofiler....csv

Having this feature avoids the many to many comparison of database samples that are not required. I am looking for this because the database can potentially be quite large.

Is the a feature you would be interested in adding? I am planning to fork the repo and see about adding it, but if this would be quick and easy for you, I would defer to you.

An interesting idea. For our needs, I tend to just re-run the many to many each time we get a new sample since it's not computationally expensive and only takes a few seconds. Then again, we only have a few hundred samples, so if you were dealing with 10s of thousands or such, I can see it being more of a hassle.

I'd welcome a PR, but I'm unlikely to add the feature myself honestly. My dev time is stretched thin as it is, and I have other pressing projects.

I will see what I can do, and push a PR. I will likely need guidance on what you want/need for unit tests etc. We can cross bridge that in the PR.

Thanks again, this is now included in the v0.1.4 release on pypi.