Errors in Chen_MSB2009 benchmark
Opened this issue · 4 comments
Looking at the Chen_MSB2009 benchmark model, I suspect I may have identified some errors in the measurements table (https://github.com/Benchmarking-Initiative/Benchmark-Models-PEtab/blob/master/Benchmark-Models/Chen_MSB2009/measurementData_Chen_MSB2009.tsv).
The original data is available from the supplement of https://doi.org/10.1038/msb.2008.74 (MSB data), which was reused in https://doi.org/10.1371/journal.pcbi.1005331 (PLoSCB data). The issue with the MSB data is that standard deviations for measurements often contain 0 (see in supplement to https://doi.org/10.1038/msb.2008.74 _dataset/Chen et al - Experimental Data/A431_experiment.out
), which makes the data not suitable for fitting. This is the likely reason why I added 0.1 to the standard deviations in the PLoSCB data (it's been a while ...; see supplement to https://doi.org/10.1371/journal.pcbi.1005331 code/project/data/getData.m
lines 756-758.).
However, I ran into the following discrepancies:
ERK_PP
data for model1_data3
condition in benchmark doesn't match MSB data (Low (1e-11 M) EGF condition) or PLoSCB data (D(3)
, lines 687-698) (looks like a copy & paster error in the benchmark data, as model for model1_data2
and model1_data3
are the same). MSB and PLoSCB data match.
AKT_PP
data for model1_data4
condition in benchmark does match MSB data (Low (1e-10 M) HRG condition) but not PLoSCB data (D(4)
, lines 704-715) (looks like a copy & paste error in PLoSCB data, as data for model1_data3
and model1_data4
are the same. This sucks, but shouldn't affect any of the conclusions in the paper).
This of course begs the question about the origin of the benchmark data. As the data in the benchmark example also contains 0.1 values (as in the PLoSCB data) for the standard deviation instead of 0.0 values (as in the MSB data), this makes me believe the measurements file in the benchmark was likely derived from PLoSCB data (likely fixing the issue with model1_data4
, but introducing the issue with model1_data3
😢).
I will refrain from making any remarks regarding how much I loathe data that is not available in easily machine readable formats and data processing pipelines that involve manual steps ...
Ah it looks like the benchmark was exported from the Hass (MATLAB) suite where the same mismatch is present: https://github.com/Benchmarking-Initiative/Benchmark-Models/blob/master/Benchmark-Models/Chen_MSB2009/Data/model1_data4.xlsx
Thanks for raising this issue, and the thorough feedback! I am currently the only maintainer of this repo now -- unfortunately, I haven't worked with this model yet.
What I got from this is:
- a note should be added, to say that the data used in the PLoS CB paper is different to what we provide
- condition
model1_data3
, observableERK_PP
needs to be changed to match MSB data - although PLoS CB gets fitting working by specifying a standard deviation of 0.1 to some data, we need to reassess how to treat the data with 0 noise
- since the objective function in your screenshot looks like least squares, I propose normal noise with standard deviation 1
- data normalization needs to be handled
- I propose estimating scaling factor(s)
I will refrain from making any remarks regarding how much I loathe data that is not available in easily machine readable formats and data processing pipelines that involve manual steps ...
😭
Thanks for the work done already for the currently implementation!