Python microservice for web scraping reviews and determining the likelihood that a review is fake.
Scraping is done using beautiful soup, fake likelihood calculation is done using proprietary website information and nltk sentiment processing on the review text.
Clone the repo
git clone https://github.com/natethinks/review-scraper.git
Download data required for NLP (natural language processing)
python3
import nltk
nltk.download()
Build the image and tag it as scraper, name can be changed to anything
docker build -t scraper .
Mount the volume with the nltk_data for NLP
docker run -v /Users/$USER/nltk_data/:/root/nltk_data/ scraper
pip3 install -r requirements.txt
python3 scrape_test.py
- In app health reporting (prometheus)
- Support retrieving addresses to be scraped from kafka or rabbitmq
- Multiple stratgies for determining fake reviews
- Custom NLP model for reviews
- More error catches and handling