/sitemap-parser

Ruby Gem to parse sitemaps.org compliant sitemaps

Primary LanguageRubyMIT LicenseMIT

Sitemap Parser

Ruby Gem to parse sitemaps.org compliant sitemaps

Build Status Gem Version

Usage

Create a new instance of the Parser:

sitemap = SitemapParser.new "http://ben.balter.com/sitemap.xml"

Extract the URLs of the sitemap

sitemap.urls # => Array of Nokigiri XML::Node objects
sitemap.to_a # => Array of url strings

Options

Recurse nested sitemaps

sitemap = SitemapParser.new('http://ben.balter.com/sitemap.xml', {recurse: true})

Or if you only want to extract only sitemap urls maching a given pattern, you can provide a regex that will be used to match each page.

sitemap = SitemapParser.new('http://ben.balter.com/sitemap.xml', {recurse: true, url_regex: /sitemapregex/})

Typhoeus Options

sitemap = SitemapParser.new('http://ben.balter.com/sitemap.xml', { userpwd: "username:password" })

Roadmap

  • sitemap_index support