Built an interactive dashboard to monitor the COVID-19 pandemic in three regions: worldwide, United States, and Europe; using Plotly and Dash. Data is updated nightly from a source provided by Johns Hopkins University Center for Systems Science and Engineering. App is live and hosted on Heroku at covid-19-raffg.herokuapp.com.
Technologies used: Python, Pandas, Plotly, Dash, Heroku
Machine learning project to classify if Trump is truly the author of any given tweet on his account, or if it was written and posted by an aide. Deployed via a Twitter bot which predicted in real-time and posted an estimated probability of Trump or an aide being the author of a tweet.
- GitHub repo
- Twitter bot account (no longer active)
- Medium post discussing procedure and results
- Medium post discussing Twitter bot development
Technologies used:
Python, scikit-learn, Pandas, Tweepy, AWS, Twitter API
Project to use several multi-armed bandit algorithms and Monte Carlo simulations to perform Bayesian A/B testing in order to compare the performance of different algorithms under various circumstances.
Technologies used:
Python, Pandas, Bayesian and classical statistics, Monte Carlo simulations, Matplotlib
Used the Spotify API and web scraping to download the valence scores for all 68,000+ songs in Spotify's Daily Top 200 charts for all available countries and dates and analyzed trends over time and by region. Discovered a mistake made by The Economist during their analysis and notified the editor.
Technologies used:
Python, Pandas, Spotify API, Spotipy, web scraping, Matplotlib
Used advanced forecasting techniques in Facebook's Prophet package to forecast some tricky edge cases using data from Instagram, Divvy bike share, and annual airline passengers.
Technologies used:
Python, Facebook Prophet, Pandas, forecasting, Instagram API
Project to use LDA topic modelling, sentiment analysis, and text summarization on the texts of the Harry Potter books.
- GitHub repo
- Medium posts:
Technologies used:
Python, regular expressions, Gensim, spaCy, NLTK, Matplotlib
Discovered a sudden and temporary increase in average likes per photo on National Geographic's Instagram account during August 2016 and investigated the probability that it could be due to random chance using t-tests.
- GitHub repo
- Tableau Public storyboard (on web)
Technologies used:
Python, statsmodels, classical statistics, Seaborn
Developed an interactive dashboard implementing Python code within Tableau to build a time-series forecast. Original project was to forecast medicine demand for a client in the pharmaceutical industry but I have anonymized the dashboard here by using the common Air Passengers dataset, in order to demonstrate Tableau's new capability of running Python.
- GitHub repo
- Medium post
- Tableau Public dashboard (on web)
- Full Tableau dashboard (requires Tableau installation)
Technologies used:
Python, statsmodels, Tableau
Used the Instagram API to collect all image metadata for the top 50 most followed users of Instagram and mined the data for insights.
- Tableau Public dashboard (on web)
- Medium post
Technologies used:
Instagram API, Tableau
Project to encode and decode an image or text hidden within another image.
Technologies used:
Python, Python Imaging Library
Update: No longer actively supported. After the Cambridge Analytica scandal, Instagram changed their API permissions.
Built a public Web Data Connector for Tableau to connect to Instagram's API and download data directly.
In Tableau, add a new data source and select Web Data Connector under the "To a Server" section. For the url, use https://raffg.github.io:443/. Follow the onscreen instructions to access data. The rate limit is 25,000 posts per hour.
Technologies used:
JavaScript, Instagram API, Tableau