/Python-for-Text-Classification

Python for Text Classification with Machine Learning in Python 3.6.

Primary LanguageJupyter NotebookMIT LicenseMIT

Python for Text Classification

Python for Text Classification with Machine Learning in Python 3.6.

Installation Guide

Start your environment by picking either pipenv (recommended) or virtualenv. Simple guides are below.

Using pipenv

  1. Initialize pipenv (setup guide):
cd path/to/your/dev/folder
mkdir text-classify
cd text-classify
pipenv install --three

After installation of pipenv works, just activate it (same on all systems):

pipenv shell
  1. Project requirements
pip install numpy scipy scikit-learn jupyter

Using virtualenv

  1. Initialize virtualenv
cd path/to/your/dev/folder
mkdir text-classify
cd text-classify
virtualenv --python3 .

After installation of pipenv works, just activate it:

Mac / Linux

source bin/activate

Windows

.\Scripts\activate
  1. Project requirements
pip install numpy scipy scikit-learn jupyter

Lessons

1 - Introduction no code

2 - Initialize Virtual Environment with Pipenv

3 - Sublime Text & Jupyter Notebooks no code

4 - Bag of Words

5 - One Hot Array

6 - Bag of Words Function

7 - One Hot Array Function

8 - One Hot Array Back to Text

9 - Bag of Words with External Data

10 - One Hot Array with External Data

11 - Training Data and Labels as Numpy Arrays

12 - Train and Predict with Sklearn SVM

13 - Text Prediction Recap

14 - Reusable Sklearn Classifier

15 - Missing Bow

16 - Pickles no code

17 - Good Data In, Good Data Out no code

18 - Dataset Resources no code Blog Post

19 Grab and Parse Dataset

20 - Prepare Training Module for Spam + Not Spam

21 - Train Spam Classifier with SVC

22 - Clean and Predict

23 - Scoring Classifier Accuracy

24 - One Hot Encoding Classification Recap

25 - Preprocessing with a Keras Tokenizer

26 - Pad Sequences

27 - Convert Our Text Data into Sequences

28 - Labels and LabelEncoder

29 - Reusable Text-Label Utility

30 - Split Training and Validation Data

31 - Tokenized Text Classifier

32 - Cross Validation

33 - Sensitivity vs Specificity