Theory and implementation of techniques for data analytics and mining with emphasis on big data. Topics include data cleaning, exploratory data analysis, data visualization, feature engineering, classification, clustering, association rule mining, predictive model evaluation, parameter tuning, natural language processing, and selected advanced data mining topics. Design and implementation of systems using contemporary data analysis and mining programming libraries for automatic discovery of patterns and knowledge.
CSC 177: Data Warehousing and Data Mining - Group Project
- Difference-between-scaling-and-normalization.pdf (good reading)
- sklearn-data-preprocessing.pdf (very useful)
- very-gentle-introduction-to-statistical-distributions.pdf
- common-probability-distributions.pdf
- Exploratory-data-analysis-and-insights.pdf
- handling-imbalanced-classes.pdf
- How-to-identifty-outliers.pdf
- how-when-why-should-you-standardize-normalize-rescale-data.pdf
- interpreting-key-results-for-correlation.pdf (good reading)
- interpreting-p-values-and-coefficients.pdf (good reading)
- Jim-Frost-part-1-curve-fitting-using-linear-and-non-linear-regression.pdf (good reading)
- Jim-Frost-part-2-when-to-standardize-variables.pdf (good reading)
- regularization with regression (and machine learning).pdf (importance for avoiding overfitting)
- ridge-and-lasso-regression.pdf
- linear, ridge and lasso regression comprehensive guide for beginners.pdf
- good-example-of-lasso-and-ridge-regression.pdf