Crawl paginated results from "Case search"

Question

Crawl paginated results from "Case search"

Opened this issue 4 years ago · 0 comments

On a case details page, we currently follow the links to policy areas, each link leading to a page with the first 50 results for this policy area.
Access to more pages requires clicking the "Next" button that triggers the execution of a Cold Fusion script.
Memorious does not do dynamic scraping, so we should complement with a selenium script that retrieves additional links and either:

stores the HTML pages for case details (but then we'd need to apply the same cleaning as in memorious) for ingestion in aleph, or
feeds them back to memorious as seeds (how could we best do that?).