A repository containing some of my kaggle notebooks which could be helpful as starter codes. My Kaggle Profile: https://www.kaggle.com/parulpandey
This notebook explains the concepts of NLP with respect to this current competition. NLP is the field of study that focuses on the interactions between human language and computers. NLP sits at the intersection of computer science, artificial intelligence, and computational linguistics[source]. NLP is a way for computers to analyze, understand, and derive meaning from human language in a smart and useful way.This Kaggle notebook with basic codes t
Table of Contents
- Importing the necessary libraries
- Reading the datasets
- Basic EDA
- Text data processing
- Transforming tokens to vectors
- Buiding a Text Classification model
This notebook comes as a second part to the Getting started with NLP Notebooks .In this notebook we shall study the various ways of vectorizing text data.Vectorization converts text data into feature vectors.
Real or Not? NLP with Disaster Tweets - A Getting started Compeition on Kaggle
Time series data is a sequence of data points in chronological order that is used by businesses to analyze past data and make future predictions.
NIFTY-50 Stock Market Data (2000-2019)
The data is the price history and trading volumes of the fifty stocks in the index NIFTY 50 from NSE (National Stock Exchange) India. All datasets are at a day-level with pricing and trading values split across .cvs files for each stock along with a metadata file with some macro-information about the stocks itself. The data spans from 1st January, 2000 to 31st December, 2019.
Extracting human understandable insights from any Machine Learning mode. Some techniques explained in this notebook are:
- Permutation Importance using ELI5 library
- Partial Dependence Plots
- SHAP Values
- Advanced Uses of SHAP Values
Pima Indians Diabetes Database This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.
A 3 part serieson Dimensionality reduction techniques using the Kannada MNIST dataset. In this series of notebooks, we shall study about three Dimensionality reduction techniques using the Kannada MNIST dataset. The techniques are PCA, t-SNE and UMAP.
Kannada MNIST The goal of this competition is to provide a simple extension to the classic MNIST competition we're all familiar with. Instead of using Arabic numerals, it uses a recently-released dataset of Kannada digits.MNIST like datatset for Kannada handwritten digits.
The beauty of using Python is that it offers libraries for every data visualisation need. One such library is Folium which comes in handy for visualising Geographic data (Geo data). Geographic data (Geo data) science is a subset of data science that deals with location-based data i.e description of objects and their relationship in space.