Pinned Repositories
365psd
A python script to download free psd files from 365psd
Algorithms
Algorithms that I m learning from Bob Sedgewick's class
django-rango
Tango with Django - Rango app
EmailAuthorPrediction
For the task of prediction of author from emails, we used Unigram language model. We started out on the problem by finding out the features that would help model the solution. The features that looked important were: • N-grams of the email • Frequency of each N-gram • Out of Vocabulary words (Spelling mistakes) The combination of first two features describes how the particular author chooses his dictionary set for writing text. Therefore, this feature can be termed as the signature of the author as all writers tend to choose only words from some defined subset of the Vocabulary. Also, the out of vocabulary words, generally the spelling mistakes done by the author, depict the style of the writing text, and therefore, comes to be an important aspect of the solution. The solution, thus, comes to be finding the total probability of each Ngram to be written by the particular author in the email.
GraphComponents-MapReduce
ideas
invertedIndex
Inverted index based search engine
MinHash
Estimating how similar are two sets using MinHash (Jaccard similarity coefficient)
QA-System
Question Answering system
Youtube-source-code
rahularora's Repositories
rahularora/MinHash
Estimating how similar are two sets using MinHash (Jaccard similarity coefficient)
rahularora/Youtube-source-code
rahularora/EmailAuthorPrediction
For the task of prediction of author from emails, we used Unigram language model. We started out on the problem by finding out the features that would help model the solution. The features that looked important were: • N-grams of the email • Frequency of each N-gram • Out of Vocabulary words (Spelling mistakes) The combination of first two features describes how the particular author chooses his dictionary set for writing text. Therefore, this feature can be termed as the signature of the author as all writers tend to choose only words from some defined subset of the Vocabulary. Also, the out of vocabulary words, generally the spelling mistakes done by the author, depict the style of the writing text, and therefore, comes to be an important aspect of the solution. The solution, thus, comes to be finding the total probability of each Ngram to be written by the particular author in the email.
rahularora/365psd
A python script to download free psd files from 365psd
rahularora/GraphComponents-MapReduce
rahularora/ideas
rahularora/invertedIndex
Inverted index based search engine
rahularora/QA-System
Question Answering system
rahularora/Algorithms
Algorithms that I m learning from Bob Sedgewick's class
rahularora/django-rango
Tango with Django - Rango app
rahularora/k-means
K-means clustering
rahularora/latentSemanticIndexing
Search engine that indexed full text using Latent Semantic Indexing
rahularora/PageRank
Mini web search engine using PageRank
rahularora/print-envelopes
Printing envelopes
rahularora/python-trees
Python tree implementation
rahularora/smartass
rahularora/WordSenseDisambiguation
Word Sense Disambiguation