w3c-validators/w3c_validators

Incorrect handling of HTTP redirect responses

Malesio opened this issue · 2 comments

The feed validator class is failing to properly validate RSS files due to the W3C website dropping support for HTTP requests:

FEED_VALIDATOR_URI = 'http://validator.w3.org/feed/check.cgi'

POSTing any kind of data to http://validator.w3.org/feed/check.cgi results in a 301 Moved Permanently (even when sending a valid RSS feed file):

~ cat sample_feed.rss 
<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">

<channel>
  <title>W3Schools Home Page</title>
  <link>https://www.w3schools.com</link>
  <description>Free web building tutorials</description>
  <item>
    <title>RSS Tutorial</title>
    <link>https://www.w3schools.com/xml/xml_rss.asp</link>
    <description>New RSS tutorial on W3Schools</description>
  </item>
  <item>
    <title>XML Tutorial</title>
    <link>https://www.w3schools.com/xml</link>
    <description>New XML tutorial on W3Schools</description>
  </item>
</channel>

</rss>~ gem list w3c     

*** LOCAL GEMS ***

w3c_validators (1.3.6)~ irb
irb(main):001:0> require 'w3c_validators'
=> true
irb(main):002:0> v = W3CValidators::FeedValidator.new
=> #<W3CValidators::FeedValidator:0x000055c69221b598 @validator_uri=#<URI::HTTP http://validator.w3.org/feed/check.cgi>, @options={:proxy_host=>nil, :proxy_po...
irb(main):003:0> v.validate_file("sample_feed.rss")
Traceback (most recent call last):
       11: from /usr/bin/irb:23:in `<main>'
       10: from /usr/bin/irb:23:in `load'
        9: from /usr/lib/ruby/gems/2.7.0/gems/irb-1.2.6/exe/irb:11:in `<top (required)>'
        8: from (irb):3
        7: from /var/lib/gems/2.7.0/gems/w3c_validators-1.3.6/lib/w3c_validators/feed_validator.rb:48:in `validate_file'
        6: from /var/lib/gems/2.7.0/gems/w3c_validators-1.3.6/lib/w3c_validators/feed_validator.rb:33:in `validate_text'
        5: from /var/lib/gems/2.7.0/gems/w3c_validators-1.3.6/lib/w3c_validators/feed_validator.rb:59:in `validate'
        4: from /var/lib/gems/2.7.0/gems/w3c_validators-1.3.6/lib/w3c_validators/validator.rb:110:in `send_request'
        3: from /var/lib/gems/2.7.0/gems/w3c_validators-1.3.6/lib/w3c_validators/validator.rb:113:in `send_request'
        2: from /usr/lib/ruby/2.7.0/net/http/response.rb:133:in `value'
        1: from /usr/lib/ruby/2.7.0/net/http/response.rb:124:in `error!'
Net::HTTPRetriableError (301 "Moved Permanently")

Upon further inspection, it became clear that the method send_request is at fault:

  • options[:url] = response['location']
    • options[:url] containing the new URI is never used by the following call to send_request: it only uses @validator_uri, which still contains the old URI subject to redirection. A simple fix would be to replace this line by something like @validator_uri = URI.parse(response['location']) to update the instance variable.

I also noticed a probably poorly named variable:

  • if response.kind_of?(Net::HTTPRedirection) and response['location'] and not following_redirect
    • Semantics would have that not following_redirect become just following_redirect to actually follow redirects. The default boolean value used in the method signature would also need to become true to default to following redirects, if that is the desired behaviour.
doc75 commented

@Malesio thanks for reporting this bug and for the analysis you have made.

Indeed the issue is related to the fact that feed validator is now only accessible through HTTPS and not HTTP.
I will publish a new version with the fix.

Regarding your last comment following_redirect is just there to know if we are on a call linked to a redirect or not (to avoid a kind of infinite loop).

doc75 commented

Version 1.3.7 published on Rubygems.org