Discrepancies in summary data
corey-taylor opened this issue · 1 comments
corey-taylor commented
The summary data in the last release (https://github.com/openkinome/kinodata/releases/tag/v0.2) differs from what reported by the notebooks and in the data files that are outputted by them:
Dataset | Non-curated | Curated |
---|---|---|
ChEMBL 27 | 182 223 | 148 836 |
ChEMBL 28 | 199 238 | 159 978 |
vs the number of unique records in the output .csv's and reported in the notebooks:
Dataset | Non-curated | Curated |
---|---|---|
ChEMBL 27 | 217 612 | 174 238 |
ChEMBL 28 | 237 336 | 186 972 |
The notebooks appear to run fine so I have added the data from the notebooks/output files themselves to the latest release (https://github.com/openkinome/kinodata/releases/tag/v0.3). But having looked at the data directly, I can't establish where the data in the first table comes from.
AndreaVolkamer commented
@mbackenkoehler and @ijpulidos I'm tagging you both here, maybe you can have a look if this is solved, since you are on the ChEMBL update anyways?