reading variants file
HoekR opened this issue · 7 comments
I try to match person names with (py) analiticcl. Reading a lexicon into a VariantModel works ok with
xmodel = VariantModel(os.path.join(bdir, "examples/simple.alphabet.tsv"), Weights(), debug=False)
xmodel.read_lexicon('<directory>/1714.tsv')
in which 1714.tsv contains a list of names like. Note that I collapsed all white spaces in names as analiticcl does not like names with spaces
cau
pesters
vanaylva
llaersma
lestevonon
rengers
lemckervanbreda
rouse
siccama
vanheeckerenvanbrantsenburg
vanalderweereld
vanhaeften
vanlyndenvanblitterswyk
This works for name searching.
However, I have a number of variants (mostly OCR variations), that are put into a tab separated file conform the instructions in the analiticcl documentation:
vanbroeckhuysen vanbroeckbuysen 6.0 wanbroeckhuysen 3.0 vanbroeckhbuysen 2.0 vanbroeckhusen 2.0
vanessen essenius 41.0 johanvanessen 40.0 hendrickvanessen 14.0 wanessen 5.0
vanalphen vanaphen 8.0 wanalphen 2.0 vanalpen 1.0 vanalpben 1.0
graswinckel grawinckel 1.0
sloet sloe 1.0
but analiticcl refuses to read it with an error:
PanicException: Variant scores must be a floating point value (line 1 of <lexicon file>.tsv): ParseFloatError { kind: Invalid }
what is wrong with the format?
also: even if I can collapse names, especially for shorter names this seems to be less than ideal
Your format is ok aside from the fact that the variant scores should be in the range (0.0 - 1.0), so you'll want to normalize them before feeding them to analiticcl. However, that's not the cause of the error. That was most likely due to a bug in analiticcl which I fixed in e753941 (v0.3.2) last week. Is your version up to date? try pip install -U analiticcl
.
my pip (on macos) refuses to update and reports that analiticcl is up to date. A manual download of the
analiticcl-0.3.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
reports an invalid wheel for this platform
If it says up to date then you must be on v0.3.2 already? (try pip show analiticcl | grep Version
). If that's the case and it still doesn't work then I need to do some further debugging, it could be that my fix was not sufficient. I'll check.
analiticcl-0.3.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
reports an invalid wheel for this platform
Yeah, that wheel is for linux on x86_64 with glibc and python 3.8, that won't work on macOS. I haven't published any macOS-specific wheels (not sure how to do that even to be honest), but it should just compile from source there (requiring rustc
and cargo
). After all it seems it initially installed fine for you, right?
I think your variant list may have one column too much (one extra tab after the first column), unless that was a copy-past thing possibly. If I copy your variant list as you pasted it here I get:
$ analiticcl query --alphabet ~W/analiticcl/examples/simple.alphabet.tsv --variants variantlist2.tsv < test.txt
Initializing model...
Loading lexicons...
thread 'main' panicked at 'Variant scores must be a floating point value (line 1 of variantlist2.tsv, got vanbroeckbuysen), no frequency information: ParseFloatError { kind: Invalid }', src/lib.rs:557:62
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Note that the error message is a bit more verbose, so to me that still suggests you're still on 0.3.1 rather than 0.3.2. If I remove the extra column it works as intended.
yes, pip just refuses to update to version 0.3.2 saying:
pip install analiticcl==0.3.2
ERROR: Could not find a version that satisfies the requirement analiticcl==0.3.2 (from versions: 0.3.1)
ERROR: No matching distribution found for analiticcl==0.3.2
the only difference I can detect is the availability of a
analiticcl-0.3.1.tar.gz source distribution at https://pypi.org/project/analiticcl/0.3.1/#files
while the 0.3.2 only has linux wheels
Ha! But that is a very good clue indeed! It seems I forgot to publish the source tarball to PyPi and only did the wheels! It should be fixed now.
yup, it works now:
Successfully uninstalled analiticcl-0.3.1
Successfully installed analiticcl-0.3.2