MS73 layer models might be overfitting
Opened this issue · 8 comments
I can best explain this with a scenario:
I did a Pixel run with folio 023. I then produced models with that run and used those to do a second Pixel run with folio 017. Those latest models could separate folio 017 fairly well. I then did a third Pixel run with folio 260. Those models separated folio 260 perfectly, but when they tried to separate 017 they produced three almost completely empty layers.
This has been happening fairly often; models we make do a great job separating the folio they were just trained on, but can't separate any other folio, including ones they were trained on previously.
These are the models I'm talking about:
Background Model MTWS 023 round 3.hdf5.zip
Model 1 MTWS 023 round 3.hdf5.zip
Model 2 MTWS 023 round 3.hdf5.zip
Model 3 MTWS 023 round 3.hdf5.zip
@kyrieb-ekat did I explain that right
Yes! What we're finding is that the models trained on specific images will do well generalizing to new random things we show them, and then at some point completely stop or freak out; usually in the vein of #1204 where we're seeing a layer stop being produced entirely, or layers are combined (usually text and music).
Does the freak out occur without retraining? What happens if you try the second time w/o retraining?
@fujinaga so far the models I've been creating have been producing consistent results; if I run the same folio twice with the same layer separation models, the results are exactly the same. So I believe the freak out happens after retraining. @kyrieb-ekat did mention something weird happening when using models trained in staging on a Paco Classifying job in production (Kyrie I'm sorry I can't remember the details!)
UPDATE - I've tried the following test:
I used folios 017, 023, 054, 274, and 288 to train two sets of models. The first set I made the usual way by iterating, using the models produced by each subsequent iteration to separate the next folio (these are the Iteration Models). The second set I made by running each folio through Pixel individually, collecting all five ZIP files and giving them to a Paco Trainer job at the same time (these are the Combo Models).
First of all, I primarily did this test to determine whether a master model for MS73 would be possible, which is why I took folios from across the manuscript. Regardless of which method I used, the final results are pretty bad, because the folios are just too different and a Master Model isn't possible. However, the two different approaches did produce different results.
The main difference is that the Iteration Models systematically produce a completely empty layer 2 (staff lines). The staff lines just stay in the background, for some reason. This is super odd, because the staff lines are usually the easiest part. The Combo Models produce a reliably correct layer 2.
Another difference is that the Combo Models successfully keep most of the page edges and cover in the background layer, which the Iteration Models do not. I'm really intrigued by this, because I always used the same area selection for each folio and made a point of erasing page edges and cover bits from the layers every time.
Here's an example. The image on the left is the layer 1 of folio 055 produced with the Iteration Models, the one on the right is layer 1 of folio 055 produced with the Combo Models:
Both are pretty spotty and overall not great, probably (hopefully) due to the folios in MS73 being too different from each other. But you can also see that the Combo Models generally just picked up more pixels, and that the Iteration Models included a big chunk of the book edges.
Finally, I noticed that the Iteration Models did best on folio 274, which was the last folio I used to train those models. The Combo Models were consistent across all the folios I tested.
Does any of this mean anything? Are these differences expected?
Question: in the iterating models were the previous models used as inputs in the training run, or just the ZIP files? It makes sense more labeled examples of various items would provide more information on how to handle them when they show up, so this is good Rodan Lore to know...
This abuts nicely with one thing I've been testing in the background about at what point rodan does best with too many samples (like, is it better to have ALL the samples added with each successive correction and not cycle them out, or is it better to retain like, three and cycle out the oldest ones), where Rodan does better with the previous training models as inputs and at least two sample inputs. There may or may not be a memory ceiling (so to speak) with samples, but since you were able to manage 5 without it breaking we might be able to get away with more than we think.
Just the ZIP files! Wait... you can input models to the Paco Trainer job? Can't you just import layers?
Yes! You can add input ports for previous models. I'm finding in my testing that at least three samples and previous training models produce best results.
I'm working on testing a single model on a variety of images to see if I can consistently replicate the weird "tiles" phenomena. Apparently there's no memory impact by having many input samples, and since Rodan doesn't remember previous runs we don't need to worry about reusing the same samples over and over again, we can show it as many samples in a run as we like (hopefully!).
UPDATE: I have conducted yet another test and obtained yet another result.
This time I tried making models using patches of a single folio. I ran the folio in Pixel four times, using a different chunk of folio 077 each time (the four chunks added up to the complete folio). The first set of models I made like so:
- Correct each patch in Pixel from nothing;
- Chuck all four ZIP files into Paco Trainer at the same time (no iteration).
The second I made like so:
- Made models from the two first patches;
- Corrected patch 3 using those models;
- Made models from patch 3, as well as the first two patches;
- Corrected patch 4 using those models;
- Made models from patch 4, as well as patch 3 and 1+2 (yes iteration).
First of all, neither set of models work particularly well. Both still get layers 1 and 3 pretty mixed up. Am I just not using enough samples?
Second of all, this time the iteration models did slightly better. The main difference is again in the border: the iteration models don't include any borders almost at all, whereas the combo models have a huge amount of border, usually in layer 1.
Demo: these are the layer 1's of folio 055, with the combo models on the left and the iteration models on the right.
Why is the border treated so differently when I painted over the exact same things? And why is this result the opposite of the previous one, where the iteration model is the one that included the border? @fujinaga is this expected behaviour? I would be grateful for futher guidance, because I'm a bit at loose ends at this point.