Hackathon exploration of PDF scraping
-
cataloguing existing packages and looking at privacy considerations
-
evaluation of different tools and approaches a. text based b. image based
-
investigation of LLMs
a. look at APIs in R, python
b. identify potential LLMs that could be used
-
investigation of validation pipelines to check extraction quality
-
investigation of scraping plots (if time allows)