Data Mining Lab - B.Tech CSE | Third Year

To run the code, simply download the file and open it in google colab/ vscode/ jupyter, etc.

To find mean, median, mode, and standard deviation. Draw a box-plot by using these,
Univariate, Bivariate, and Multivariate Analysis using Histograms, Q-Q plot, Bar Graphs, Scatter Plots and Heatmaps,
Perform EDA on a dataset, addressing missing values, to enhance modeling readiness,
Explore and implement diverse data transformation techniques (Z-score, Min-Max, Mean normalization, Max Absolute, Robust scaling) in Python, understanding their impact on data distribution for effective preprocessing,
Demonstrate the following Similarity and Dissimilarity Measures using python: a) Euclidean Distance, b) Manhattan Distance, c) Minkowski Distance, d) Cosine Similarity,
Demonstrate the usage of the following Association Rule Mining algalgorithms using the attached dataset: a) Apriori algorithm without any libraries, b) Apriori algorithm using ML-Xtend module,
1. Using FP-Growth Algorithm, do Market Basket Analysis on the given dataset,
2. Using Decision Tree classification,
i) Perform Logistic Regression on the given dataset. ii) Plot the confusion Matrix in heatmap form from the model generated in step (i) and print the accuracy, precision, recall, and f-1 score.
Compare various boosting methods (Gradient boosting, XGboost, Adaboost, CAT boost) on given dataset,
a) Perform Classification using a Naïve Bayes Classifier on the given Dataset, b) Perform Regression using a Regression Tree on the given dataset,
Perform different clustering techniques on the given dataset.