Learning Data Science

In this repository, I'll keep the code I write as I learn about Data Science.

I write about what I am learning here: https://medium.com/@gabrieltseng/

For all notebooks which require a GPU (anything which includes Keras or Tensorflow), I use an AWS P2 instance.

I approached the projects in the following order (latest to earliest)

Natural Language Processing/TwitterDisasters

I build a tweet summarizer (COWTS), with the goal of providing a useful summary of tweets to a rescue team in a disaster scenario. This involves experimenting with Integer Linear Programming, term frequency - inverse document frequency scores and word graphs.

Natural Language Processing/Detecting Bullies

I train machine learning algorithms on a smaller dataset (~3000 datapoints) to recognize bullying in online discussions, as part of Kaggle's Detecting Insults in Social Commentary competition. By implementing word embeddings, I significantly improve the competition's best result.

Style Neural Network

I experiment with generative neural networks by building a style neural network, which takes as input two images, and outputs an image with the content of the first image and the style of the second image. I improve the original neural style network (A Neural Network of Artistic Style) by implementing two additional papers (Incorporating Long Range Consistency in CNN based Texture Generation and Stable and Controllable Neural Texture Synthesis and Style Transfer Using Histogram Losses).

Natural Language Processing/Quora

I build a recurrant neural network based on the GloVe word embeddings to recognize the intent of questions posted on Quora as part of Kaggle's Quora Question Pairs competition.

Recommender System

In this project, I use the Movie Lens dataset to explore a variety of data science tools, including dimensionality reduction and word embeddings. I build a recommender system using a recurrant neural network, and implement Google's Wide and Deep recommender neural network.

Image Recognition

In this project, I finetune and ensemble a variety of pretrained convolutional neural networks in Keras to identify invasive plant species in images, as part of Kaggle's Invasive Species Monitoring competition.