katliyx
M.S. in Business Analytics, University of Notre Dame | University of Michigan - Ann Arbor Alum
Pinned Repositories
attrition-viz-R
This short project conducts exploratory and descriptive analysis on an in-built dataset: attrition.
effectiveleader-regression-diagnosis-sampling-R
This short project conducts regression analysis on a data set of factors that would possibly impact how effective a leader could be. Relevant regression diagnosis procedures are also included with responsive sampling solutions such as bootstrapping.
flightdelayanalysis-python
The classic flight delay analysis with Python.
glassdoor-textanalytics-R
This project is based on a dataset from Glass Door. It is built for various purposes: (1) Exploration about the unigram term frequency, tf-idf, and bigram presence of the text in pro reviews, con reviews, and advice. (2) What rating type has an outstanding pattern other than the rest and is worthy of further inspection? (3) How does the sentiment scores shown in pro reviews, con reviews, and advice affect the rating of interest? (4) Topic modeling for pro reviews, con reviews, and advice. (5) For the topics modeled, how does rating impact the mapping of such topics?
HP-unstructureddataanalytics-objectdetectionR
This is a fun unstructured data analytics project that is based on a data set including all text for the series of Harry Potter book. Questions to answer include, but are not limited to: Does the sentiment analysis on single words tell the true story? How does sentiment change as readers (in this case, RStudio and certain lexicon packages) progress through the chapters in each book? If the sentiment is analyzed on a bigram basis, how does the overall result change? What about analyzing on a sentence basis? How does text analytics vary among different lexicons? At the end of the project, it also includes a section of image/objection detection that examines on some famous movie scenes from Harry Potter.
katliyx.github.io
A repository documenting my data science work samples using R, Python, and SQL.
machinelearning-R
This repository includes some short machine learning projects using techniques and models including k-means clustering, decision trees, and logistic regression.
playappstore-hypotheses-regression
Google Play App Store Analysis: There are many factors that affect the reviews and ratings of the applications available in Google's Play App Store market. To name a few: the genre, the download size, the content rating, the frequency of updates, the pricing, and etc. These could be individual contributors to the matter. Or, they could impact the final ratings and reviews in an interactive manner. This project aims to investigate on this issue by conducting regression analysis on the data such as the ANOVA test and the t-test. Given the techniques applied, certain limitations are presented; further insights are subsequently derived.
winequality-python
A project that aims to predict the red wine quality (ratings) by selected features. Includes preliminary analysis, data visualization, and classification and cross validation using logistic regression, support vector machine, decision tree, and random forest on ordinal data, binary data, and categorical data.
WWEearningscall-textanalytics-R
This project starts with text analytics, investigating on topics of term frequency and sentiment with regard to the text that appears in the call transcripts. Further, as the stock price would be the major indicator of a public company’s performance from a financial period, the aforementioned text analytics results would be incorporated with the fluctuation of stock prices overtime.
katliyx's Repositories
katliyx/HP-unstructureddataanalytics-objectdetectionR
This is a fun unstructured data analytics project that is based on a data set including all text for the series of Harry Potter book. Questions to answer include, but are not limited to: Does the sentiment analysis on single words tell the true story? How does sentiment change as readers (in this case, RStudio and certain lexicon packages) progress through the chapters in each book? If the sentiment is analyzed on a bigram basis, how does the overall result change? What about analyzing on a sentence basis? How does text analytics vary among different lexicons? At the end of the project, it also includes a section of image/objection detection that examines on some famous movie scenes from Harry Potter.
katliyx/katliyx.github.io
A repository documenting my data science work samples using R, Python, and SQL.
katliyx/attrition-viz-R
This short project conducts exploratory and descriptive analysis on an in-built dataset: attrition.
katliyx/effectiveleader-regression-diagnosis-sampling-R
This short project conducts regression analysis on a data set of factors that would possibly impact how effective a leader could be. Relevant regression diagnosis procedures are also included with responsive sampling solutions such as bootstrapping.
katliyx/flightdelayanalysis-python
The classic flight delay analysis with Python.
katliyx/glassdoor-textanalytics-R
This project is based on a dataset from Glass Door. It is built for various purposes: (1) Exploration about the unigram term frequency, tf-idf, and bigram presence of the text in pro reviews, con reviews, and advice. (2) What rating type has an outstanding pattern other than the rest and is worthy of further inspection? (3) How does the sentiment scores shown in pro reviews, con reviews, and advice affect the rating of interest? (4) Topic modeling for pro reviews, con reviews, and advice. (5) For the topics modeled, how does rating impact the mapping of such topics?
katliyx/machinelearning-R
This repository includes some short machine learning projects using techniques and models including k-means clustering, decision trees, and logistic regression.
katliyx/playappstore-hypotheses-regression
Google Play App Store Analysis: There are many factors that affect the reviews and ratings of the applications available in Google's Play App Store market. To name a few: the genre, the download size, the content rating, the frequency of updates, the pricing, and etc. These could be individual contributors to the matter. Or, they could impact the final ratings and reviews in an interactive manner. This project aims to investigate on this issue by conducting regression analysis on the data such as the ANOVA test and the t-test. Given the techniques applied, certain limitations are presented; further insights are subsequently derived.
katliyx/winequality-python
A project that aims to predict the red wine quality (ratings) by selected features. Includes preliminary analysis, data visualization, and classification and cross validation using logistic regression, support vector machine, decision tree, and random forest on ordinal data, binary data, and categorical data.
katliyx/WWEearningscall-textanalytics-R
This project starts with text analytics, investigating on topics of term frequency and sentiment with regard to the text that appears in the call transcripts. Further, as the stock price would be the major indicator of a public company’s performance from a financial period, the aforementioned text analytics results would be incorporated with the fluctuation of stock prices overtime.
katliyx/manhattanpropertyanalysis-R
This project is based on a mega data set of properties in Manhattan. It begins with the cleaning and processing of the data set of a relatively massive volume, and continues on with regression analysis, visualization, and correlation analysis.
katliyx/movieswebscraping-python
This project contains web scraping off some movie ratings websites.
katliyx/pic
katliyx/piccc
katliyx/wine-R
This short project creates several visualization utilizing a data set about worldwide wine information.