unitedstates/congress

(votes, committee_meetings): senate.gov and clerk.house.gov not redirecting to https

ryparker opened this issue · 0 comments

Problem

When running tasks for votes or committee_meetings then the requests to download from senate.gov and clerk.house.gov fail to redirect to https and timeout/fail.

Cause

Requests to http://senate.gov or http://clerk.house.gov respond with 301 redirecting to the https:// version and It looks like the request library (scrapelib) is not configured to following redirects.

PR here: #285

Reproduce

Run the votes or committee_meetings tasks or you can verify the redirect:

$ curl -i http://senate.gov

HTTP/1.1 301 Moved Permanently
Server: AkamaiGHost
Content-Length: 0
Location: http://www.senate.gov/
Date: Thu, 19 May 2022 03:14:17 GMT
Connection: keep-alive
$curl -i http://clerk.house.gov

HTTP/1.1 301 Moved Permanently
Content-Type: text/html; charset=UTF-8
Location: https://clerk.house.gov/
Vary: Accept-Encoding, Cookie
X-Xss-Protection: 1;mode=block
Strict-Transport-Security: max-age=0;
X-Frame-Options: DENY
X-Content-Type-Options: nosniff
X-Permitted-Cross-Domain-Policies: none
Referrer-Policy: no-referrer
Content-Security-Policy: …
Date: Thu, 19 May 2022 03:15:26 GMT
Content-Length: 147

<head><title>Document Moved</title></head>
<body><h1>Object Moved</h1>This document may be found <a HREF="https://clerk.house.gov/">here</a></body>%