harvard-lil/scoop

HTTP Proxy futureproofing

matteocargnelutti opened this issue · 0 comments

The goal is to replace transparent-proxy, an external dependency, with a custom implementation based on Node.js' network primitives to both simplify and solidify one of Scoop's core components.

@leppert is currently leading this effort.


Status

Last update March 31 2023

  • Working prototype!
  • Ongoing PR @ replace-transparent-proxy
    • Figure out what to do with WebSocket exchanges.
      • Current setup doesn't let them through, which may lead to page crashes on websites using WS.
      • As a first iteration, let exchanges go through, but don't capture them?
    • To investigate / determine relevance: some responses appear to be force-downloaded by Chromium, transiently.
    • Handle http.Server's checkContinue event
    • Handle http.Server's checkExpectation event
    • Handle http.ClientRequest's connect event (for additional proxy hops)
    • Handle http.ClientRequest's continue event
    • Handle http.ClientRequest's information event
    • ScoopProxy: Collect SSL certificates and pass them to Scoop.provenanceInfo.certificates
      • To be moved to its own issue, TBD post-merge
    • Documentation / Comments
      • ScoopProxy
      • Portal (not strictly required for merge)
    • Complete move to https://github.com/harvard-lil/portal
    • Checks before PR merge:
      • Double check: Are we good with socket exceptions handling? 🍿
      • Double check: blocklist enforcement
      • Double check: time limit enforcement
      • Double check: size limit enforcement
      • Double check: Run against set of X urls and compare results with main.