support sitemap files
rockdaboot opened this issue · 2 comments
rockdaboot commented
Download sitemap urls from robots.txt (zipped and unzipped).
Parse these files with Mgets XML parser to fetch all urls.
Respect additional information/schemas from 'urlset', e.g.http://www.google.com/schemas/sitemap-image/1.1.
See http://www.sitemaps.org/protocol.html for more information.
rockdaboot commented
Mget now supports sitemap index files and sitemap files in 'sitemap' format (gzip compressed and uncompressed) and in plain text format. Snanning of RSS and Atom feed formats for sitemap files and within HTML will be supported soon.
rockdaboot commented
Added parsing RSS 2.0 and Atom 1.0 feeds.