openkinome/kinodata

Discrepancies in summary data

corey-taylor opened this issue · 1 comments

The summary data in the last release (https://github.com/openkinome/kinodata/releases/tag/v0.2) differs from what reported by the notebooks and in the data files that are outputted by them:

Dataset Non-curated Curated
ChEMBL 27 182 223 148 836
ChEMBL 28 199 238 159 978

vs the number of unique records in the output .csv's and reported in the notebooks:

Dataset Non-curated Curated
ChEMBL 27 217 612 174 238
ChEMBL 28 237 336 186 972

The notebooks appear to run fine so I have added the data from the notebooks/output files themselves to the latest release (https://github.com/openkinome/kinodata/releases/tag/v0.3). But having looked at the data directly, I can't establish where the data in the first table comes from.

@mbackenkoehler and @ijpulidos I'm tagging you both here, maybe you can have a look if this is solved, since you are on the ChEMBL update anyways?