Analyze news content from different media
- Install Python 3.7+
- Create virtual environment:
python -m venv venv
- Activate virtual environment:
- windows:
venv/Scripts/activate
- macOS/linux:
source venv/bin/activate
- windows:
- Install requirements:
pip install -r requirements.txt
- MySQL Setup:
- Install MySQL Server and MySQL Connector for python:
- Create
config.py
. Seeconfig.example.py
template for what values to put inconfig.py
. - (For MySQL Server) Create database:
python create_database.py
- Connect to MySQL:
Project not finished! Instructions will be available when project is completed.
This project contains a script to scrape headlines from one of the following websites.
- https://abcnews.go.com/
- https://www.cnn.com/
- https://www.foxnews.com/
- https://www.nbcnews.com/
- https://www.reuters.com/
python -m venv venv
- Activate virtual environment:
- windows:
venv/Scripts/activate
- macOS/linux:
source venv/bin/activate
- windows:
pip install -r requirements.txt
python general_scrape.py
This project contains a javascript script to scrape headlines from one of the following websites.
Note: This script may be outdated. We recommend you use the python script to scrape websites.
- install npm, node
cd repository
npm install cheerio axios
cd scrape_web
node scrape_web/scrape1.js
Description of our process:
- Grabbed html from a given website
- Used BeautifulSoup package to parse html and find headlines
Resources that we used:
- https://www.dataquest.io/blog/web-scraping-tutorial-python/
- https://towardsdatascience.com/web-scraping-news-articles-in-python-9dd605799558
- https://www.crummy.com/software/BeautifulSoup/bs4/doc/#tag
Description of our process:
- Setup MySQL server (see Install section)
- Used mysql-connector-python package to create database (see Install section)
- Used mysql-connector-python package to insert headlines data into database
Resources that we used:
- https://www.edureka.co/blog/mysql-tutorial/
- https://www.cis.uni-muenchen.de/~hs/teach/14s/ir/rdbms.pdf
- https://dev.mysql.com/doc/refman/8.0/en/sql-statements.html
Description of our process:
- Used Natural Language Processing Modules: nltk
Resources that we used:
Description of our process:
- Set up a cron job (we are running on a ubuntu machine). For windows, you may need to use task scheduler.
Resources that we used: