handle 'clash' in score file
Closed this issue · 1 comments
momeara commented
In the https://zenodo.org/record/7903161 ro4 dataset several of the score files have clash
in the score column, which causes the amcp_preparation
script to fail at the Finding threshold
stage. Filtering them out allows it proceed, but giving them a large positive score would probably also be reasonable.
Read 15470302 substances from data/ro4/a2a.ro4.tsv
Filtered to 13233417 substances
Read 15473497 substances from data/ro4/ampc.ro4.tsv
Filtered to 15473497 substances
Read 14820549 substances from data/ro4/cd73.ro4.tsv
Filtered to 14820549 substances
Read 15473491 substances from data/ro4/d2.ro4.tsv
Filtered to 14842441 substances
Read 15473411 substances from data/ro4/keap1.ro4.tsv
Filtered to 15473411 substances
Read 15489870 substances from data/ro4/mpro.ro4.tsv
Filtered to 14889954 substances
Read 15473496 substances from data/ro4/ogg1.ro4.tsv
Filtered to 14865258 substances
Read 15473497 substances from data/ro4/sort1.ro4.tsv
Filtered to 15473497 substances
AJluttens commented
I have updated the datasets. The "clash" should have indeed been an arbitrary large score (10000 kcal/mol), as we described in the pre-print. A new version of the Zenodo is now available at https://zenodo.org/record/7953917. This should resolve the issue at hand. Thanks for your interest!