Resources for upskilling on NLP.
- Featurize free-form text data using
mmlspark
on top of primitives in SparkML via a single transformer in this officialmmlspark
notebook - Good NLTK tutorial, albeit with some fun Python code fixes needed :) - NLTK Tutorial: Natural Language Processing w/ Python (for a working notebook see notebooks folder)
- Document classification with
pyspark
with HDInsight on Azure - NLP with
MLLib
from official Spark Docs - NLTK and
pyspark
from Anaconda Docs
- Python/scikit-learn: Calculating TF/IDF on How I met your mother transcripts for TF/IDF with scikit-learn