Scrape integration fails to verify SSL certificate
JPorter-02 opened this issue · 5 comments
The problem
When using the scrape integration with resource https://archiveofourown.org and verify ssl set to true, i receive the following error:
Logger: homeassistant.components.rest.data
Source: components/rest/data.py:128
integration: rest (documentation, issues)
First occurred: 10:28:29 PM (6 occurrences)
Last logged: 10:55:37 PM
Error connecting to https://archiveofourown.org/works/51298045 failed with [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)
**Setting the "Verify SSL certificate" to false or off breaks the resource and associated sensors
What version of Home Assistant Core has the issue?
core-2024.10.2
What was the last working version of Home Assistant Core?
core-2024.10.2
What type of installation are you running?
Home Assistant OS
Integration causing the issue
Scrape
Link to integration documentation on our website
https://www.home-assistant.io/integrations/scrape
Diagnostics information
home-assistant_scrape_2024-10-18T04-03-13.736Z.log
Example YAML snippet
No response
Anything in the logs that might be useful for us?
Logger: homeassistant.components.rest.data
Source: components/rest/data.py:128
integration: rest (documentation, issues)
First occurred: 10:28:29 PM (8 occurrences)
Last logged: 11:06:44 PM
Error connecting to https://archiveofourown.org/works/51298045 failed with [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)
Additional information
The debug file contains a lot of other messages from other integrations. I am happy to provide any other info that may be needed to help get this resolved. I believe the system root SSL certs may just need to be updated, but I was unable to do that via CLI. Based on history from HA, all sensors attached to this resource stopped reporting at 10/16/24 at or around 5:45a.
Hey there @fabaff, @gjohansson-ST, mind taking a look at this issue as it has been labeled with an integration (scrape
) you are listed as a code owner for? Thanks!
Code owner commands
Code owners of scrape
can trigger bot actions by commenting:
@home-assistant close
Closes the issue.@home-assistant rename Awesome new title
Renames the issue.@home-assistant reopen
Reopen the issue.@home-assistant unassign scrape
Removes the current integration label and assignees on the issue, add the integration domain after the command.@home-assistant add-label needs-more-information
Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.@home-assistant remove-label needs-more-information
Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.
(message by CodeOwnersMention)
scrape documentation
scrape source
(message by IssueLinks)
You could always try to connect to the page via CLI so we could start there to know if it's OS certification problem or something built into the libraries used for this integration.
Sorry for the delay in getting back to this. I successfully tried to scrape the URL using curl and everything works as it should. Here is the output from a curl -v from windows powershell incase it helps. Something to note, the "You are being redirected" at the end is expected
`
curl -v https://archiveofourown.org/works/51298045
- Host archiveofourown.org:443 was resolved.
- IPv6: (none)
- IPv4: 104.20.28.24, 104.20.29.24
- Trying 104.20.28.24:443...
- Connected to archiveofourown.org (104.20.28.24) port 443
- schannel: disabled automatic use of client certificate
- ALPN: curl offers http/1.1
- ALPN: server accepted http/1.1
- using HTTP/1.x
GET /works/51298045 HTTP/1.1
Host: archiveofourown.org
User-Agent: curl/8.9.1
Accept: /
- Request completely sent off
- schannel: remote party requests renegotiation
- schannel: renegotiating SSL/TLS connection
- schannel: SSL/TLS connection renegotiated
< HTTP/1.1 302 Found
< Date: Mon, 21 Oct 2024 16:42:21 GMT
< Content-Type: text/html; charset=utf-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< Location: /works/51298045/chapters/129614635
< CF-Ray: 8d62b34cb9dcbf9d-ATL
< CF-Cache-Status: DYNAMIC
< Cache-Control: no-cache
< Set-Cookie: view_adult=true; path=/; SameSite=Lax
< content-security-policy: frame-ancestors 'self'
< potential_upstream: unicorn_bots
< referrer-policy: strict-origin-when-cross-origin
< x-ao3-priority: 0
< x-aooo-debug1: Archive Unicorn
< x-clacks-overhead: GNU Terry Pratchett
< x-content-type-options: nosniff
< x-download-options: noopen
< x-frame-options: SAMEORIGIN
< x-hostname: ao3-front10
< x-permitted-cross-domain-policies: none
< x-request-id: c9a89c0a-fd93-4b49-864d-651ebb84e568
< x-runtime: 0.025693
< x-sentry-rate: 0.01
< x-xss-protection: 1; mode=block
< Set-Cookie: _otwarchive_session=eyJfcmFpbHMiOnsibWVzc2FnZSI6ImV5SnpaWE56YVc5dVgybGtJam9pTnpoalpqTmpaREE1WldVMVpUTmlNakE0TW1Ga1lXWTJOVGt6WVdVME1tRWlMQ0p5WlhSMWNtNWZkRzhpT2lJdmQyOXlhM012TlRFeU9UZ3dORFUvZG1sbGQxOWhaSFZzZEQxMGNuVmxJbjA9IiwiZXhwIjoiMjAyNC0xMS0wNFQxNjo0MjoyMS42MTFaIiwicHVyIjoiY29va2llLl9vdHdhcmNoaXZlX3Nlc3Npb24ifX0%3D--beed74d07fc1dd6a7ab2fd507894080f87a8fd42; path=/; expires=Mon, 04 Nov 2024 16:42:21 GMT; HttpOnly; SameSite=Lax
< Set-Cookie: __cf_bm=BoK5IowXr8FptqQC6mSjiPwjKBNypObnj.Qa9DjQzDM-1729528941-1.0.1.1-t2U.HywKpsJjWLam_G1aXx0qu.uNt80oOFsKd6bmh47khXwsekQ_SHAOnmhLvFL0Cc8TNQl5ggrkcgzrZZNxUg; path=/; expires=Mon, 21-Oct-24 17:12:21 GMT; domain=.archiveofourown.org; HttpOnly; Secure; SameSite=None
< Set-Cookie: _cfuvid=j9Et.2xCXLYkM8RSHU6OlfFvHxy8c49hogDLwf1kGv4-1729528941632-0.0.1.1-604800000; path=/; domain=.archiveofourown.org; HttpOnly; Secure; SameSite=None
< Server: cloudflare
< alt-svc: h3=":443"; ma=86400
<
I also tried the curl command in the HA Terminal Add-On and successfully connected to the site and was able to retrieve the stats i am attempting to scrape