This Python script takes a URL for a webpage as an input and then takes all the text of that page and outputs it to a JSON file.
To use the script, run it from the command line and enter the URL of the webpage you want to scrape when prompted:
python3 scrape.py
Enter the URL of the webpage: [Enter URL here]
The script will then extract all the text from the webpage and output it to a file named output.json
.
The script requires the following Python libraries:
requests
beautifulsoup4
You can install these libraries using pip
:
pip3 install requests beautifulsoup4
The script works as follows:
- Prompt the user to enter the URL of the webpage they want to scrape.
- Use the
requests
library to make a GET request to the URL and get the HTML content of the webpage. - Use the
BeautifulSoup
library to parse the HTML content and extract all the text from the webpage. - Create a dictionary to store the data we want to output to the JSON file. This dictionary contains the URL of the webpage and the text of the webpage.
- Use the
json
library to write the data to a JSON file namedoutput.json
.