ReddiTagger: Advanced NER and Sentiment Analysis with spaCy & Flair
Dive deep into Reddit's data with ReddiTagger. Harnessing the prowess of spaCy's en_core_web_trf
and the sentiment analysis capabilities of Flair, this project is designed to efficiently extract named entities and analyze sentiments. Visual insights are made available via a user-friendly Dash interface.
Features
- Entity Extraction: Lean on the precision of spaCy's Transformer model,
en_core_web_trf
, for top-tier NER. - Sentiment Determination: Exploit Flair's sentiment analysis capabilities to ascertain sentiments.
- Interactive Data Visualization: Present the gathered insights through a vibrant Dash dashboard.
- Secure Authentication: Leverage Reddit's OAuth for seamless and secure data access.
Installation & Setup
Prerequisites
Ensure Python 3.x is installed on your machine.
Step-by-step Installation
-
Clone the Repository:
git clone https://github.com/BetikuOluwatobi/ReddiTagger.git
-
Navigate to the Project Directory:
cd ReddiTagger
-
Set up a Virtual Environment and Activate It:
python3 -m venv myenv source myenv/bin/activate
-
Install Required Libraries:
pip install -r requirements.txt
-
Fetch the Essential spaCy Model:
python -m spacy download en_core_web_trf
-
Environment Variables: Update your environment with
CLIENT_ID
,CLIENT_SECRET
, andREDIRECT_URI
for Reddit API interactions. -
Reddit App Configuration: Ensure you've set
redirect_url
tohttp://localhost:5000/callback
within your Reddit app.
Launching the Application
-
Run the following command:
python app.py
-
Open a browser and visit
http://localhost:5000/
to experience the ReddiTagger dashboard.
How to Use
- Homepage: Initiate by authenticating through Reddit using the secure OAuth2 protocol.
- Authenticate: Pick your desired subreddit and specify the entity type (e.g., Organization, Location, Country/State) for analysis.
- Dashboard: Explore interactive visualizations, shedding light on entity sentiments. Fine-tune your view by adjusting the sentiment score slider.
Docker Setup
Heads-up: The Docker image is sizable (~7GB). Patience is the key during the build.
-
Craft the Docker Image:
docker build -t redditagger .
-
Deploy the Docker Container: Remember to slot in your specific
CLIENT_ID
andCLIENT_SECRET
.docker run -d -p 5000:5000 -e CLIENT_ID=<YOUR_CLIENT_ID> -e CLIENT_SECRET=<YOUR_CLIENT_SECRET> redditagger
Tip: Procure your
CLIENT_ID
andCLIENT_SECRET
from the Reddit App Preferences atreddit.com/prefs/apps
. If you're a first-timer, initiate by creating a Reddit App. Your credentials will be listed under the app's details section.
Comprehensive Video Walkthrough
For an extensive tutorial on ReddiTagger, we've curated a video series to assist you:
Contribution
Your insights can shape ReddiTagger's future! Feel free to fork, tweak, and submit pull requests.
Licensing
ReddiTagger is open-sourced under the MIT license.
Dive into the ocean of Reddit data with ReddiTagger and unearth exciting insights! 🚀