Make fathom-list ignore `resources` directories
erikrose opened this issue · 1 comments
erikrose commented
When fathom-list is run recursively (with the -r
option), it could recurse into resources
directories emitted by fathom-extract
and result in vectorizing HTML files embedded in samples (iframe contents, etc.). These do occur occasionally, as in BG_305 from the new-password project. Have it ignore resources
dirs.
erikrose commented
In general, do this in samples_from_dir() and have fathom-train and fathom-test both pass recursive=True to it.