Add support for "directory" input instead of single file
philschmid opened this issue · 0 comments
philschmid commented
Add support to provide a directory with multiple HTML files instead of a single file for "clipping". The idea would be to read the directory convert files to PDFs and then save them as a single dataset.jsonl