The Website Sitemap Scraper is a Python script that allows you to fetch and extract sitemap links from a website. This tool is useful for collecting information about a website's structure and content.
- Fetches sitemap links from a specified website.
- Saves the sitemap links to a text file for future reference.
Before you begin, ensure you have met the following requirements:
- Python 3.7 or higher installed on your system.
- The following Python libraries installed:
httpx
: Used for making asynchronous HTTP requests.selectolax
: Used for parsing HTML/XML content.
You can install the required libraries using pip:
pip install -r requirements.txt
- Clone this repository to your local machine:
git clone https://github.com/your-username/Sitemap-Postlink-Scraper.git
- Navigate to the project directory:
cd Sitemap-Postlink-Scraper
- Run the script:
python sitemap_post_scraper.py
-
Follow the on-screen instructions to provide the URL of the website you want to scrape.
-
If a sitemap is found on the website, the script will fetch and save the sitemap links to a text file named _sitemap_links.txt.