Crawl paginated results from "Case search"
Opened this issue · 0 comments
moreymat commented
On a case details page, we currently follow the links to policy areas, each link leading to a page with the first 50 results for this policy area.
Access to more pages requires clicking the "Next" button that triggers the execution of a Cold Fusion script.
Memorious does not do dynamic scraping, so we should complement with a selenium script that retrieves additional links and either:
- stores the HTML pages for case details (but then we'd need to apply the same cleaning as in memorious) for ingestion in aleph, or
- feeds them back to memorious as seeds (how could we best do that?).