zendframework/zend-feed

Reader::findFeedLinks() failing with protocol-relative URLs

Closed this issue · 0 comments

I've noticed a little bug on Zend\Feed\Reader\Reader::findFeedLinks method:

When a website is providing the feed URL as a 'protocol-relative' URL, the final URI returned by Reader\FeedSet::absolutiseUri method is in a invalid format.

Example:

The Engadget website feed URL is //www.engadget.com/rss.xml (is not started with http://, but with only //), so, when the Reader module prepare it to use, the URL returned is: http://www.engadget.com/www.engadget.com/rss.xml, and obviously when we try to read data from this URL, it fails.

<?php
$feedLinks = Zend\Feed\Reader\Reader::findFeedLinks('http://www.engadget.com');

echo $feedLinks->rss;
// Output: http://www.engadget.com/www.engadget.com/rss.xml

For now, I solved the problem by checking if the URL starts with // on Reader\FeedSet::absolutiseUri and prepending the protocol if it don't exists:

src/Reader/FeedSet.php:65
...

    protected function absolutiseUri($link, $uri = null)
    {
        /*
         * Fix: check if $link is a protocol-relative URL and prepend protocol to it
         */
        if (substr($link, 0, 2) == '//') {
            $link = 'http:' . $link;
        }

        $linkUri = Uri::factory($link);
        if (!$linkUri->isAbsolute() or !$linkUri->isValid()) {
            if ($uri !== null) {
                $uri = Uri::factory($uri);
...

Can you confirm this bug? Once confirmed, I can open a PR with this fix.

Thanks!