/DMcompetitions

DM competitions

Primary LanguageHTML

DATA MINING

Data Mining example dataset. In this repository there are three different examples of machine learning techniques based on R. The main library used is CARET, which provides a very clean way to control and tune the parameters of the model. In detail we have addressed three different problems: two of regression and a classification problem.

  1. Boston Ames House: that is a data set in which we want to predict the price at which a house was sold in the Boston area. The resulting best technique was a combination of the result produced by a Gradient Boosting, a SVM and a penalized regression of the lasso. Results are visible at https://federicomelograna.github.io/DMcompetitions/RMD_Housepricing_final.html

  2. In Bike Sharing the main topic is the forecast of the number of bikes rented for the city of San Francisco for different days. In that case the main topic was feature engineering. We had, relatively, a lot of data and a small number of features, so it was really important to create new features. https://federicomelograna.github.io/DMcompetitions/RMD_BikeSharingfinal.html

  3. The last data set was OKCupid, in which we want to classify whether our client works or not in a STEM field. https://federicomelograna.github.io/DMcompetitions/OkCupidGitHub.html