This project is built using Python and the Scrapy framework to scrape news websites for the latest articles and news updates. The project extracts data from the news articles and stores them in a csv file for further analysis.
├── spiders # Contains spiders
├── news.py # Contains main logic of extracting data
├── LICENSE
├── README.md # Documentation
├── items.py
├── middlewares.py
├── pipelines.py
├── requirements.txt
├── settings.py # Configuration file for the Scrapy project.
- Clone the repository:
git clone https://github.com/mahendra-shah/news_scrapy.git
- Install the required dependencies:
pip install -r requirements.txt
- Navigate to the project directory.
- Run the following command to start the news scraping process:
scrapy crawl news
For any questions or feedback, feel free to contact the project owner at mahendra21@navgurukul.org.