proycon/analiticcl

reading variants file

HoekR opened this issue · 7 comments

HoekR commented

I try to match person names with (py) analiticcl. Reading a lexicon into a VariantModel works ok with

xmodel = VariantModel(os.path.join(bdir, "examples/simple.alphabet.tsv"), Weights(), debug=False)
xmodel.read_lexicon('<directory>/1714.tsv')

in which 1714.tsv contains a list of names like. Note that I collapsed all white spaces in names as analiticcl does not like names with spaces

cau
pesters
vanaylva
llaersma
lestevonon
rengers
lemckervanbreda
rouse
siccama
vanheeckerenvanbrantsenburg
vanalderweereld
vanhaeften
vanlyndenvanblitterswyk

This works for name searching.

However, I have a number of variants (mostly OCR variations), that are put into a tab separated file conform the instructions in the analiticcl documentation:

vanbroeckhuysen		vanbroeckbuysen	6.0	wanbroeckhuysen	3.0	vanbroeckhbuysen	2.0	vanbroeckhusen	2.0
vanessen		essenius	41.0	johanvanessen	40.0	hendrickvanessen	14.0	wanessen	5.0
vanalphen		vanaphen	8.0	wanalphen	2.0	vanalpen	1.0	vanalpben	1.0
graswinckel		grawinckel	1.0
sloet		sloe	1.0

but analiticcl refuses to read it with an error:

PanicException: Variant scores must be a floating point value (line 1 of <lexicon file>.tsv): ParseFloatError { kind: Invalid }

what is wrong with the format?
also: even if I can collapse names, especially for shorter names this seems to be less than ideal

Your format is ok aside from the fact that the variant scores should be in the range (0.0 - 1.0), so you'll want to normalize them before feeding them to analiticcl. However, that's not the cause of the error. That was most likely due to a bug in analiticcl which I fixed in e753941 (v0.3.2) last week. Is your version up to date? try pip install -U analiticcl.

HoekR commented

my pip (on macos) refuses to update and reports that analiticcl is up to date. A manual download of the

analiticcl-0.3.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl

reports an invalid wheel for this platform

If it says up to date then you must be on v0.3.2 already? (try pip show analiticcl | grep Version). If that's the case and it still doesn't work then I need to do some further debugging, it could be that my fix was not sufficient. I'll check.

analiticcl-0.3.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl

reports an invalid wheel for this platform

Yeah, that wheel is for linux on x86_64 with glibc and python 3.8, that won't work on macOS. I haven't published any macOS-specific wheels (not sure how to do that even to be honest), but it should just compile from source there (requiring rustc and cargo). After all it seems it initially installed fine for you, right?

I think your variant list may have one column too much (one extra tab after the first column), unless that was a copy-past thing possibly. If I copy your variant list as you pasted it here I get:

$ analiticcl query --alphabet ~W/analiticcl/examples/simple.alphabet.tsv --variants variantlist2.tsv < test.txt
Initializing model...
Loading lexicons...
thread 'main' panicked at 'Variant scores must be a floating point value (line 1 of variantlist2.tsv, got vanbroeckbuysen), no frequency information: ParseFloatError { kind: Invalid }', src/lib.rs:557:62
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Note that the error message is a bit more verbose, so to me that still suggests you're still on 0.3.1 rather than 0.3.2. If I remove the extra column it works as intended.

HoekR commented

yes, pip just refuses to update to version 0.3.2 saying:

pip install analiticcl==0.3.2
ERROR: Could not find a version that satisfies the requirement analiticcl==0.3.2 (from versions: 0.3.1)
ERROR: No matching distribution found for analiticcl==0.3.2

the only difference I can detect is the availability of a
analiticcl-0.3.1.tar.gz source distribution at https://pypi.org/project/analiticcl/0.3.1/#files
while the 0.3.2 only has linux wheels

Ha! But that is a very good clue indeed! It seems I forgot to publish the source tarball to PyPi and only did the wheels! It should be fixed now.

HoekR commented

yup, it works now:

Successfully uninstalled analiticcl-0.3.1
Successfully installed analiticcl-0.3.2