everypolitician/scraped

Add a class to represent requests and their responses

Closed this issue · 0 comments

Problem

We currently use the vague term "strategy" when dealing with the thing that makes requests and returns response. This has becoming increasingly problematic as "strategy" has come to mean (at least!) two different things:

  1. Strategy for obtaining a response (e.g. live request, get from archive, get from local disk cache)
  2. Strategy for storing responses (e.g. write to archive or write to local disk cache)

Proposed solution

Create a ScrapedPage::Request class that takes a url, read_strategies and write_strategies which has a #response method for obtaining a ScrapedPage::Response instance.

ScrapedPage::Request.new(
  url: url, 
  read_strategy: [
    ExistingOpenUriCache.new(
      directory: '.cache', 
    ),
    Archive.new(
      repo: 'everypolitician-scrapers/kenesh-kg-old', 
      branch: 'scraped-pages-archive',
    ),
  ],
  write_strategy: Archive.new(
    repo: 'everypolitician-scrapers/kenesh-kg', 
    branch: 'scraped-pages-archive',
  ),
).response