AbsoluteUrls: print warning when URL can't be parsed
Opened this issue · 0 comments
chrismytton commented
Problem
At the moment if AbsoluteUrls
can't parse a URL this rescue
block is silently swallowing the exception. This makes it tricky to debug problems such as the one we had while doing everypolitician-scrapers/denmark-folketing#3, where image URLs with a space character in them weren't being parsed.
Proposed solution
Something like this:
diff --git a/lib/scraped/response/decorator/absolute_urls.rb b/lib/scraped/response/decorator/absolute_urls.rb
index 268a695..66902af 100644
--- a/lib/scraped/response/decorator/absolute_urls.rb
+++ b/lib/scraped/response/decorator/absolute_urls.rb
@@ -16,7 +16,8 @@ module Scraped
def absolute_url(relative_url)
URI.join(url, relative_url) unless relative_url.to_s.empty?
- rescue URI::InvalidURIError
+ rescue URI::InvalidURIError => e
+ warn "Could not make #{relative_url.inspect} absolute: #{e.message}" if ENV['VERBOSE']
relative_url
end
end