/Dropbox-Sentiment-Analysis

This is a part of "Airbyte + MotherDuck Hackathon" project which aims to leverage Airbyte and Motherduck and showcase the "Sentiment Analysis on Dropbox app user Reviews"

Primary LanguagePythonOtherNOASSERTION

main

main

πŸ“Š Dropbox User Sentiment Analysis Dashboard

This project demonstrates how to set up a Dropbox User Sentiment Analysis Dashboard using Airbyte for data extraction, Motherduck (DuckDB) for storage and querying, and Streamlit for visualization. πŸš€

🌟 Overview of the Project

The goal is to analyze user reviews of the Dropbox app using sentiment analysis techniques. Here's the workflow breakdown:

  1. Dataset Source: A CSV dataset of Dropbox app user reviews (from Kaggle).
  2. Preprocessing: Data uploaded to Google Sheets for formatting.
  3. Airbyte Integration: Google Sheets (source) connected to Motherduck (destination) via Airbyte.
  4. Destination Setup: Motherduck stores data in DuckDB.
  5. Sentiment Analysis: Python and Streamlit dashboard for data visualization.

πŸ”— Live Demo: Streamlit Dashboard

πŸ”— Blog Post: Detailed Guide on Sentiment Analysis


πŸ› οΈ Tech Stack

  • Airbyte: Data Integration
  • Motherduck (DuckDB): Database Management
  • Streamlit: Data Visualization
  • Python: Backend and Sentiment Analysis
  • TextBlob: Sentiment Analysis Library
  • Plotly: Data Visualization Library

πŸ“ Folder Structure Overview

DROPBOX-REVIEWS-ANALYSIS
β”œβ”€β”€ .devcontainer
β”‚   β”œβ”€β”€ devcontainer.json
β”œβ”€β”€ .streamlit
β”‚   β”œβ”€β”€ config.toml
β”œβ”€β”€ assets
β”‚   β”œβ”€β”€ main.png
β”œβ”€β”€ dropbox-reviews-analytics
β”‚   β”œβ”€β”€ src
β”‚   β”‚   β”œβ”€β”€ config
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ config.py
β”‚   β”‚   β”œβ”€β”€ utils
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ database.py
β”‚   β”‚   β”œβ”€β”€ app.py
β”œβ”€β”€ .env
β”œβ”€β”€ venv
β”œβ”€β”€ .gitignore
β”œβ”€β”€ LICENSE.md
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt

πŸ‘‰ Key Directories Explained:

  • .streamlit/config.toml: UI customization for Streamlit.
  • src/config/config.py: Handles environment variables.
  • src/utils/database.py: Database queries with Motherduck.
  • src/app.py: Core Streamlit app logic.
  • .env: Stores secure environment variables.

πŸš€ Setup and Installation

1️⃣ Clone the Repository

git clone https://github.com/abhirajadhikary06/Dropbox-Sentiment-Analysis.git
cd Dropbox-Sentiment-Analysis

2️⃣ Create and Activate Virtual Environment

python -m venv venv
source venv/bin/activate  # On macOS/Linux
venv\Scripts\activate    # On Windows

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Setup Environment Variables

Create a .env file in the root directory and add:

MOTHERDUCK_TOKEN=your_motherduck_api_key

5️⃣ Run the Streamlit App

streamlit run src/app.py

The app will be available at http://localhost:8501.


🧠 Core Components

πŸ“Š Sentiment Analysis Logic

Using TextBlob, we calculate sentiment polarity and subjectivity.

from textblob import TextBlob

def get_sentiment(text):
    blob = TextBlob(str(text))
    return blob.sentiment.polarity if sentiment_type == "Polarity" else blob.sentiment.subjectivity

πŸ¦† Motherduck Database Integration

import duckdb
from config.config import MOTHERDUCK_TOKEN

def get_connection():
    return duckdb.connect(f"md:?token={MOTHERDUCK_TOKEN}")

def get_reviews_for_sentiment():
    conn = get_connection()
    query = """
    SELECT content, score FROM dropbox_reviews WHERE content IS NOT NULL
    """
    return conn.execute(query).fetch_df()

πŸ“ˆ Visualization Example

import plotly.express as px
fig = px.histogram(reviews_df, x='sentiment', title='Sentiment Distribution')
st.plotly_chart(fig)

⚠️ Deployment Notes

  • Avoid specifying exact library versions in requirements.txt.
  • Ensure .env is correctly configured.
  • Validate database connection tokens during runtime.

Deployment Steps:

  1. Load .env variables.
  2. Connect securely to Motherduck.
  3. Serve the dashboard via Streamlit.

🎯 What’s Next?

  • Improve dashboard interactivity.
  • Add real-time review updates.
  • Expand to analyze multiple datasets.

πŸ”— Complete Project on GitHub: GitHub Repository

πŸ”— Live Demo: Streamlit Dashboard

πŸ”— Motherduck Instance:

-- Run this snippet to attach database
ATTACH 'md:_share/abhiraj_db/275eb3cc-2d8b-4705-a787-39c8010e8b2f';

πŸ“œ License

This project is licensed under the Creative Commons Zero v1.0 Universal.