/data_science_task

Text classification and tabular data tasks

Primary LanguageHTML

Data Science tasks

This repository contains notebooks and code for three tasks:

  1. Text classification: given a file with articles' headlines, classify if they are sarcastic or not. Techniques tried:
  • TF-IDF and Logistic Regression
  • word counts and Naive Bayes
  • deep neural network using LSTM and pretrained Glove embeddings
  1. Tabular data exploration, cleansing and classification:
  • analyze corrupted dataset
  • perform necessary cleaning
  • visualize the data
  • apply a classifier
  1. SQL query: given three tables with students' data, their marks and classes taught at the university, write a simple query to find students that scored well on algebra classes.