Is pre-training allowed?
markNZed opened this issue · 0 comments
I think we could agree that all the detectors have parameters/constants that are set prior to running in the benchmark, this might be simple things like sampling rates, input filtering etc. If we start using pre-trained ANN to guide the detectors then this becomes more problematic - there seems to be a high chance that the ANN would learn aspects of the benchmark if it is trained on data from the benchmark. The designer of the detector is somewhat doing this during development/test but given human limitations it does not seem as problematic (it is however likely that some of the current detectors are over-fitting to NAB).
It seems reasonable that the pre-training should be part of the detector (not a set of parameters) so there can be something like an audit of what the pre-training is doing. If the pre-training is only using data from the current benchmark timeseries it is running on (and only using previously seen data points) then this seems within the bounds of the NAB rules.
But what if the pre-training is using data augmentation techniques (e.g. to train a DNN) ? The data augmentation could be based on the current timeseries data that has been seen by the detector. I guess that if the data augmentation is using a statistical approach then this falls inside the bounds of the NAB rules.
It may be desirable to pre-train using data that is not from NAB (e.g. to avoid overfitting to data in a given timeseries). This starts to get problematic as that pre-training data could be selected so that the system overfits to NAB. If the training data is generated by a statistical algorithm then is this within the NAB rules? I guess yes but parameterising the statistics is an obvious way to tune the detector to overfit for NAB.
This raises more fundamental questions about the design of NAB. How can we have a benchmark that does not allow for detectors over-fitting to the benchmark? In ML there is the idea of splitting the data set into train/dev/test sets and the test data set is not used during development and this allows for checking that the system is not overfitting to the train+dev set. In NAB we basically provide a single dev set and measure performance on that. This introduces the problem of how to have a hold-out test set in an open source project!
Perhaps a benchmark needs to have at least some of the timeseries data generated by algorithm. But then it requires an algorithm to detect the ground truth for anomalies and the whole point of NAB is that we don't have great algorithms for doing that.
It seems that some sort of web service that hides the hold-out data and runs NAB is the best solution. I'm guessing that a lot of people would accept an audit of the system by someone independent. It still requires someone to pay for that server and set it up but I don't think that is a big budget nowadays. If we used cloud infrastructure the service just needs to spin up and run NAB when anyone pushes to a new branch. Maybe the user could pay a small fee to even cover the compute cost.