Simple Python Code to Extract Text From a Website

About the code

The code will scrape the text content of a website using python 3.x. It require urllib and beautifulsoup libraries to do the same. It requires links has to be placed in a separate file as links.txt. Each URL should form a septate line. In case, if you don't want to be this as a separate file, replace ln. 8 in scrape.py to

urls = ['http://www.xyz.com','http://www.abc.com', ...]

Beautifulsoup will parse and clean the code by removing scripts and design elements. Then, it will be written in the output file data.txt.

Usage

Clone this repository or download the contents as zip file
Replace the links in links.txt file
Run python scrape.py
Scraped data will be available in data.txt

Library Installation

If you have both python 2 and 3 use the command python3 to run the file
Install urllib by
- pip3 install urllib3
Install beautifulsoup by
- pip install beautifulsoup4

prvn-pa/scrape

Simple Python Code to Extract Text From a Website

About the code

Usage

Library Installation