A tool for scraping and visualizing search results from Reddit.
Json-Server is utilized as the back-end for visualizing the scraped data, and Node.js is required in order to use it. The following bash commands can be used to install Node.js in Debian-based architectures. For other architectures, please refer to the official installation guide.
curl -sL https://deb.nodesource.com/setup_8.x | sudo -E bash -
sudo apt-get install -y nodejs
Inside the jsonserver directory, run the following bash command without modifying the existing files:
npm install --save json-server
Beautiful Soup 4 has to be installed along with the LXML parser. Also the requests library is required to access the HTML content of Reddit.
pip3 install beautifulsoup4
pip3 install requests
pip3 install lxml
This tool is made up of two parts; a web scraper and a dynamic web page for visualizing the results.
The scraped data is stored in a file called product.json
, and it is served by Json-Server to the front-end for visualization.
You can make scraper limit its search by a specific subreddit, or you can make it search all subreddits.
For example, in order to search for the keyword uzay in all subreddits, run the command below inside the root project folder:
python3 scraper.py --keyword="uzay"
In order to search for the keyword ayn rand in the subreddit r/objectivism, run the command below inside the root project folder:
python3 scraper.py --keyword="ayn rand" --subreddit="objectivism"
If there is an existing product.json
file, the scraper will append the search results of a new keyword at the end of the file.
If the keyword already exists in product.json
, the scraper will start searching from the date of the most recent post and append the new content at the end of the existing posts that had been earlier saved for that keyword.
When the product.json
file is ready for visualization, run the following command inside the jsonserver directory in order to start the Json-Server:
npm run json:server
Then, open up the reddit.html file inside a browser.