Ruby Gem to parse sitemaps.org compliant sitemaps
Create a new instance of the Parser:
sitemap = SitemapParser.new "http://ben.balter.com/sitemap.xml"
Extract the URLs of the sitemap
sitemap.urls # => Array of Nokigiri XML::Node objects
sitemap.to_a # => Array of url strings
sitemap = SitemapParser.new('http://ben.balter.com/sitemap.xml', {recurse: true})
Or if you only want to extract only sitemap urls maching a given pattern, you can provide a regex that will be used to match each page.
sitemap = SitemapParser.new('http://ben.balter.com/sitemap.xml', {recurse: true, url_regex: /sitemapregex/})
sitemap = SitemapParser.new('http://ben.balter.com/sitemap.xml', { userpwd: "username:password" })