/portfolio

Portfolio of personal data science projects

Portfolio of personal data science projects

COVID-19 Interactive Web Dashboard

Built an interactive dashboard to monitor the COVID-19 pandemic in three regions: worldwide, United States, and Europe; using Plotly and Dash. Data is updated nightly from a source provided by Johns Hopkins University Center for Systems Science and Engineering. App is live and hosted on Heroku at covid-19-raffg.herokuapp.com.

dashboard

Technologies used: Python, Pandas, Plotly, Dash, Heroku


Who's Tweeting from the Oval Office?

Machine learning project to classify if Trump is truly the author of any given tweet on his account, or if it was written and posted by an aide. Deployed via a Twitter bot which predicted in real-time and posted an estimated probability of Trump or an aide being the author of a tweet.

trump ticker

Technologies used:
Python, scikit-learn, Pandas, Tweepy, AWS, Twitter API


A/B testing with Multi-Armed Bandits

Project to use several multi-armed bandit algorithms and Monte Carlo simulations to perform Bayesian A/B testing in order to compare the performance of different algorithms under various circumstances.

exp3

Technologies used:
Python, Pandas, Bayesian and classical statistics, Monte Carlo simulations, Matplotlib


Spotify sentiment analysis

Used the Spotify API and web scraping to download the valence scores for all 68,000+ songs in Spotify's Daily Top 200 charts for all available countries and dates and analyzed trends over time and by region. Discovered a mistake made by The Economist during their analysis and notified the editor.

Mood by continent

Technologies used:
Python, Pandas, Spotify API, Spotipy, web scraping, Matplotlib


Forecasting in Python with Facebook Prophet

Used advanced forecasting techniques in Facebook's Prophet package to forecast some tricky edge cases using data from Instagram, Divvy bike share, and annual airline passengers.

kosh_dp

Technologies used:
Python, Facebook Prophet, Pandas, forecasting, Instagram API


Harry Potter NLP

Project to use LDA topic modelling, sentiment analysis, and text summarization on the texts of the Harry Potter books.

Harry Potter sentiment

Technologies used:
Python, regular expressions, Gensim, spaCy, NLTK, Matplotlib


@Natgeo Instagram anomaly

Discovered a sudden and temporary increase in average likes per photo on National Geographic's Instagram account during August 2016 and investigated the probability that it could be due to random chance using t-tests.

NatGeo distribution

Technologies used:
Python, statsmodels, classical statistics, Seaborn


Forecasting with Python and Tableau

Developed an interactive dashboard implementing Python code within Tableau to build a time-series forecast. Original project was to forecast medicine demand for a client in the pharmaceutical industry but I have anonymized the dashboard here by using the common Air Passengers dataset, in order to demonstrate Tableau's new capability of running Python.

tableau_arima

Technologies used:
Python, statsmodels, Tableau


The Top 50 Most Followed Instagrammers

Used the Instagram API to collect all image metadata for the top 50 most followed users of Instagram and mined the data for insights.

avg likes per million followers

Technologies used:
Instagram API, Tableau


Steganography

Project to encode and decode an image or text hidden within another image.

forest

Technologies used:
Python, Python Imaging Library


Tableau Web Data Connector

Update: No longer actively supported. After the Cambridge Analytica scandal, Instagram changed their API permissions.

Built a public Web Data Connector for Tableau to connect to Instagram's API and download data directly.

web data connector

In Tableau, add a new data source and select Web Data Connector under the "To a Server" section. For the url, use https://raffg.github.io:443/. Follow the onscreen instructions to access data. The rate limit is 25,000 posts per hour.

Technologies used:
JavaScript, Instagram API, Tableau