mozilla/fathom

Make fathom-list ignore `resources` directories

erikrose opened this issue · 1 comments

When fathom-list is run recursively (with the -r option), it could recurse into resources directories emitted by fathom-extract and result in vectorizing HTML files embedded in samples (iframe contents, etc.). These do occur occasionally, as in BG_305 from the new-password project. Have it ignore resources dirs.

In general, do this in samples_from_dir() and have fathom-train and fathom-test both pass recursive=True to it.