pycsw data.gov failing
Closed this issue · 10 comments
Trying to see if I can harvest from data.gov, failing
http://catalog.data.gov/csw
https://catalog.data.gov/csw
http://catalog.data.gov/csw-all
https://catalog.data.gov/csw-all
hi David, we have seen this CSW also has started failing in Geoportal Server 1.2.7. I have reached out to GSA (they operate Data.gov).
As a hint, neither of these two calls deliver any record content:
Any other combination of possible values of ElementSetName, typenames, outputformat, etc. doesn't help.
&resultType=results works. not pretty.
https://catalog.data.gov/csw?request=GetRecords&service=CSW&version=2.0.2&ElementSetName=full&typenames=csw:Record&resultType=results
not to mention that these include many http-only links that will start to fail given the https-only policy in place for the federal government or when using Geoportal Server over https
Currently, harvester has a "Data.gov" dedicated input broker which allows to acquire metadata from that particular source. Since pycsw keeps failing, "Data.gov" broker is using a combination of CKAN and WAF methods to get job done. Broker itself requires no configuration beside giving a name; all other properties are optional.
So if we want to filter based on an organization, how might we do that?
hi @tomkralidis. Geoportal Server can do federated search to CSW and there are users who want to do this. We're seeing intermittent success. Most common issue appears to be 403 responses.
This request shows the response error:
https://gptogc.esri.com/geoportal/rest/distributed?rid=local&ridName=This%20Site&rids=local%2CdataGov&searchText=water&start=1&max=10&orderBy=relevance&f=atom
Exception when Posting CSW query to https://catalog.data.gov/csw-all: HTTP Request failed: HTTP/1.1 403 Forbidden
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<TITLE>ERROR: The request could not be satisfied</TITLE>
</HEAD>
<BODY>
<H1>403 ERROR</H1>
<H2>The request could not be satisfied.</H2>
<HR noshade size="1px">
Request blocked.
<BR clear="all">
<HR noshade size="1px">
<PRE>
Generated by cloudfront (CloudFront)
Request ID: fZtHiItoRuNkv0Wx9yrLUUXENvYEatwAkc0nyjPlEzOFmkKi0b2eJg==
</PRE>
<ADDRESS></ADDRESS>
</BODY>
</HTML>
The request is blocked
This is not a response from pycsw but from a proxy/caching layer in front of data.gov
we know. but we haven't been able to get GSA to resolve this