This Repo Contains all the Projects 4 I did in My Last Semester.
Implemented HERE
This Notebook contains various Insights and Interesting Patterns of Binge watching that people have adopted Since 2008 That Includes:-
- Penetration Of Netflix in Globally from 2008.
- Preferrable Movie Duration of People In Different countries.
- Countries for which NETFLIX makes most of the content.
.... and other very interesting insights.
So this project was personally my favorite instead of using DOC2VEC from GENSIM One day I thought of Creating my own DOCUMENT EMBEDDINGS.
Implemented here
In this project I Tried to Use TFIDF(i.e Term Frequency and Inverse Document Frequency) and Word2vec from Google I didn't Trained Word2vec on my Training Data(Just wanted to See the Results).
To read about TFIDF refer here
Please Read This Blog this will help you to Understand word2vec here
for more detailed explaination check this
These Techniques were :-
I tried to try to use word2vec for each word embeddings in a sentences and then sum those embedding to get document embedding for that document.
I used TfIdf for finding weight of each word in a document and then multiplied it to word vector created by word2vec for that word And finally summed all word vectors to get document embedding for that document.
Surprisingly These Techniques Got me in Top 30 % of this Competition
Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies).
But, it’s not always clear whether a person’s words are actually announcing a disaster. Take this example:
The author explicitly uses the word “ABLAZE” but means it metaphorically. This is clear to a human right away, especially with the visual aid. But it’s less clear to a machine.
And For Data Visualisation and For Beginners I made Notebook Here
and This was My Improved Attempt Below
Implemented here
To Download code is Here
Data available here
Wantto Improve this approach and Make It Multilingual Text Classification for Tweets.
Recommender system, these systems have een used by various companies like NETFLIX and AMAZON to recommend products to their customers similar to that we are building a news system. Where we dont have any user data we only have news articles data. For generating user data we have used various statistical distributions
Have you ever thought
- What feature (words) were most important for your Text - classification system or even other #nlp tasks?
- What Pixels are most important for your Computer Vision Task?
- What columns( in case of Tabular Data) are most Important and How Much they contribute in your Classification or Regression Problem?