/ultimate-sitemap-parser

Ultimate Website Sitemap Parser

Primary LanguagePythonOtherNOASSERTION

Build Status Documentation Status Coverage Status PyPI package Download stats

Website sitemap parser for Python 3.5+.

Features

Installation

pip install ultimate-sitemap-parser

Usage

from usp.tree import sitemap_tree_for_homepage

tree = sitemap_tree_for_homepage('https://www.nytimes.com/')
print(tree)

sitemap_tree_for_homepage() will return a tree of AbstractSitemap subclass objects that represent the sitemap hierarchy found on the website; see a reference of AbstractSitemap subclasses.

If you'd like to just list all the pages found in all of the sitemaps within the website, consider using all_pages() method:

# all_pages() returns an Iterator
for page in tree.all_pages():
    print(page)

all_pages() method will return an iterator yielding SitemapPage objects; see a reference of SitemapPage.