- Text_Classification_Pytorch.ipynb: AG News data from pytorch, classified by news type. Use a simple MLP with the raw tokens. Result is quite good > 90% accuracy, probably because this is an easy task
- Sentiment_Analysis_Using_Universal_Sentence_Embedder: Twitter sentiments, labels look rather arbitrary . Using Universal Sentence Embedder and an MLP. Poor result currently 40% accuracy.
- sentiment-analysis-marketing/: using Amazon reviews data, and nltk python library. Rule based method
- Resume parsing using spaCy, done in this repo
TODO:
- project: Hackernews Progressive Web App with a search bar using fuzzy search and semantic search
- pick one dataset, maybe Twitter dataset and try different methods with it
- Try ULMFit
- play with this gensim-based movie genre plot project (comes with slides)
- play with the Annotated Transformers
doc = nlp( "In 1990, more than 60% of people in East Asia were in extreme poverty. " "Now less than 4% are." ) When two strings are placed together in python inside a function call, they are concatenated together. This is a convenience method to make code easy to read
- Transformer_walthrough.ipynb: code walk through of Transformers model.
- Building block of Transformers: Attention, Layer Normalization, Residual Connection, Collapsing then Expanding Linear connections
- When creating embedding with torch.Embedding() they has N(0, 1) distribution. Embeddings are scaled by the factor of sqrt(emb_size) before adding with positional embedding. This helps to reduce the impact of positional embedding in proportion to the word embedding. Inside Attention layers, we did a step of rescaling the embedding by 1/sqrt(emb_size) which get the mean and standard deviation to N(0, 1) back again
- a library for NER, dependency parsing, and pattern matching.
- This is good book with project example for spaCy. PROGRESS: done!
- an optimized library for loading text into different file formats, word2vec and document similarity