Shikhar0605
Senior undergraduate at IIT(BHU) |Machine Learning| |Data Analytics| |Data Science| Seeking for internship
Pinned Repositories
Applying-Regression-model-on-house-sales-data
Applied Turicreate Linear Regression model on house sales dataset. Examined the effect of feature selections on model accuracy.
Classifier-model-for-Cancer-Detection-Malignant-or-Benign
Applied Scikit learn K-Nearest Neighbor classification algorithm to develop a model for Breast Cancer diagnosis
Creating-and-manipulating-graph-using-Networkx
Credit-card-fraud-detection
Applied SVC classifier and Logistic Regression classifier algorithm onto credit card transaction dataset to detect any fraud.
Custom-data-visualization-using-Matplotlib
Created a dynamic graph using matplotlib to better judge probabilistic data generated through the election dataset. A generated graph changes its colour w.r.t change in y-axis values.
Documents-similarity-prediction-using-Wikipedia-s-People-Dataset
Applied Turicreate's Nearest Neighbor Model to predict the similarity between any two documents taken from Wikipedia's People Dataset.
Hypothesis-testing-using-T-test
Hypothesis: University towns have their mean housing prices less effected by recessions. Performed a T-test to compare the ratio of the mean price of houses in university towns the quarter before the recession starts compared to the recession bottom.
Long-Term-Stock-Price-Growth-Prediction-using-NLP-on-10-K-Financial-Reports
A 10-K FInancial Report is a comprehensive report which must be filed annually by all publicly traded companies about its financial performance. These reports are filed to the US Securities Exchange Commission (SEC). This is even more detailed than the annual report of a company. The 10K documents contain information about the Business' operations, risk factors, selected financial data, the Management's discussion and analysis (MD&A) and also Financial Statements and supplementary data. I have been expected to build an NLP pipeline that ingests 10-K reports of various publicly traded companies and build a machine learning model which can uncover the hidden signals to predict the long term stock performance of a company from the 10-K docs using the ‘Loughran McDonald Master Dictionary’. The Dictionary contain words that are specifically curated in the context of financial reports
Social-Media-Sentiment-Analysis
Using Text Mining and Natural Language Processing Techniques pre- processed 50k tweets. Visualized the impact of hashtags on tweets sentiment using Seaborn. Applied machine learning models, calculated f1_scores, accordingly used the best model for sentiment prediction.
Temperature-analysis-using-NCEI-Dataset
Analyzed the temperature variations of "Ann Arbor, Michigan, U.S." over 2005-2014 using NCEI Dataset.
Shikhar0605's Repositories
Shikhar0605/Long-Term-Stock-Price-Growth-Prediction-using-NLP-on-10-K-Financial-Reports
A 10-K FInancial Report is a comprehensive report which must be filed annually by all publicly traded companies about its financial performance. These reports are filed to the US Securities Exchange Commission (SEC). This is even more detailed than the annual report of a company. The 10K documents contain information about the Business' operations, risk factors, selected financial data, the Management's discussion and analysis (MD&A) and also Financial Statements and supplementary data. I have been expected to build an NLP pipeline that ingests 10-K reports of various publicly traded companies and build a machine learning model which can uncover the hidden signals to predict the long term stock performance of a company from the 10-K docs using the ‘Loughran McDonald Master Dictionary’. The Dictionary contain words that are specifically curated in the context of financial reports
Shikhar0605/Applying-Regression-model-on-house-sales-data
Applied Turicreate Linear Regression model on house sales dataset. Examined the effect of feature selections on model accuracy.
Shikhar0605/Classifier-model-for-Cancer-Detection-Malignant-or-Benign
Applied Scikit learn K-Nearest Neighbor classification algorithm to develop a model for Breast Cancer diagnosis
Shikhar0605/Creating-and-manipulating-graph-using-Networkx
Shikhar0605/Credit-card-fraud-detection
Applied SVC classifier and Logistic Regression classifier algorithm onto credit card transaction dataset to detect any fraud.
Shikhar0605/Custom-data-visualization-using-Matplotlib
Created a dynamic graph using matplotlib to better judge probabilistic data generated through the election dataset. A generated graph changes its colour w.r.t change in y-axis values.
Shikhar0605/Documents-similarity-prediction-using-Wikipedia-s-People-Dataset
Applied Turicreate's Nearest Neighbor Model to predict the similarity between any two documents taken from Wikipedia's People Dataset.
Shikhar0605/Hypothesis-testing-using-T-test
Hypothesis: University towns have their mean housing prices less effected by recessions. Performed a T-test to compare the ratio of the mean price of houses in university towns the quarter before the recession starts compared to the recession bottom.
Shikhar0605/Social-Media-Sentiment-Analysis
Using Text Mining and Natural Language Processing Techniques pre- processed 50k tweets. Visualized the impact of hashtags on tweets sentiment using Seaborn. Applied machine learning models, calculated f1_scores, accordingly used the best model for sentiment prediction.
Shikhar0605/Temperature-analysis-using-NCEI-Dataset
Analyzed the temperature variations of "Ann Arbor, Michigan, U.S." over 2005-2014 using NCEI Dataset.
Shikhar0605/Movie-Recommender-Engine
Created a movie recommender engine based on cosine similarity
Shikhar0605/Network-Connectivity
Importing and analyzing an internal email communication network between employees of a mid-sized manufacturing company. Each node represents an employee and each directed edge between two nodes represents an individual email. The left node represents the sender and the right node represents the recipient.
Shikhar0605/NLP-on-10K-Documents
This projects helps scraping and analysing the 10K and 10Q documents filed by publicly traded companies to the SEC.
Shikhar0605/Pandas-for-Data-Science-Assignment-1
Basics of Pandas for Data Analysis on Census Data
Shikhar0605/Predicting-Property-Maintenance-Fines-using-Logistic-Regression
Applied Scikit learn Logistic Regression algorithm to predict whether a given blight ticket will be paid on time
Shikhar0605/ProgrammingAssignment2
Repository for Programming Assignment 2 for R Programming on Coursera
Shikhar0605/Project-Report
This project's sole aim is to find out whether there exists any relationship between the World's University Ranking and the expenditure made by each country for their respective education system.
Shikhar0605/Regex-Assignment
The goal of this assignment is to correctly identify all of the different date variants encoded in this dataset and to properly normalize and sort the dates.
Shikhar0605/Salary-and-new-connections-predictions-using-Networkx
By using Networkx and ML algorithms created a model to predict whether or not employees in a given company are receiving a management position salary. Also predicted future connections between the employees of the network.
Shikhar0605/SEC-10K-item-1a-ML-Kmean-Clustering
This project was my final project for the UOFM data analytics certificate program used ML to cluster text files and validated those clusters using stock market data
Shikhar0605/Sentiment-Analysis
Exploratory sentiment analysis of a firm's management discussion from 10K annual SEC filing
Shikhar0605/Spelling-Recommender
Created three different spelling recommenders, that each take a list of misspelled words and recommends a correctly spelled word for every word in the list. Each spelling recommender uses different Jaccard distance metrics. For every misspelled word, the recommender find the word in correct spellings that has the shortest distance, and starts with the same letter as the misspelled word, and return that word as a recommendation.