Carlssonlab/conformalpredictor

handle 'clash' in score file

Closed this issue · 1 comments

In the https://zenodo.org/record/7903161 ro4 dataset several of the score files have clash in the score column, which causes the amcp_preparation script to fail at the Finding threshold stage. Filtering them out allows it proceed, but giving them a large positive score would probably also be reasonable.

Read 15470302 substances from data/ro4/a2a.ro4.tsv
Filtered to 13233417 substances

Read 15473497 substances from data/ro4/ampc.ro4.tsv
Filtered to 15473497 substances

Read 14820549 substances from data/ro4/cd73.ro4.tsv
Filtered to 14820549 substances

Read 15473491 substances from data/ro4/d2.ro4.tsv
Filtered to 14842441 substances

Read 15473411 substances from data/ro4/keap1.ro4.tsv
Filtered to 15473411 substances

Read 15489870 substances from data/ro4/mpro.ro4.tsv
Filtered to 14889954 substances

Read 15473496 substances from data/ro4/ogg1.ro4.tsv
Filtered to 14865258 substances

Read 15473497 substances from data/ro4/sort1.ro4.tsv
Filtered to 15473497 substances

I have updated the datasets. The "clash" should have indeed been an arbitrary large score (10000 kcal/mol), as we described in the pre-print. A new version of the Zenodo is now available at https://zenodo.org/record/7953917. This should resolve the issue at hand. Thanks for your interest!