Analytics and Systems of Big Data Lab Work
- Course Code : COM406P
- Course Faculty : Dr. Sivaselvan B
Batch No : 08
- Viraj Sonatkar (CED15I015)
- Gowtham Munukutla (CED15I019)
- Akshay Kumar (CED15I031)
- Use Python / R library for
- Apriori (ARM) : by testing it for atleast 5 measures of pattern evaluation / interestingness other than Support and Confidence
- Selecting the right interestingness measure for association patterns
- Lift
- Conviction
- Leverage
- Collective Strength
- Added Value
- Bayes or Decision Tree (Classifier) : All measures of classifier accuracy
- F1 Score
- Specificity
- Sensitivity
- Recall
- Precision
- AUC (Area Under Curve)
- K-Means (Clustering) : atleast 3 parameters of cluster quality
- Radius
- Clustering
- Parity of Clusters
- Apriori (ARM) : by testing it for atleast 5 measures of pattern evaluation / interestingness other than Support and Confidence
- Explore all FIM (Frequent Itemset Mining) library support in Python / R : Atleast 5 algorithms other than Apriori
- Implement DIC (Dynamic Itemset Counting) in Python / R
- Implement efficient version of K-Means / Hierarchical (Dendrogram)
- Clue : Min Heap data structure
- Implement any one ARBC (Association Rule Based Classifiers) algorithm
- Explore all information evaluation measures of Decision Tree (atleast 3)
- Shanon's Entropy Theorem (Information Gain)
- Explore data preprocessing support in Python / R (atleast 5)
- Data Preprocessing Techniques for Data Mining - IASRI
- Data Smoothing
- Data Binning
- Explore Python / R library support for ECLAT (Equivalence CLAss Transformation)
- Try out all efficient variants of Apriori
- Hashing
- Transaction Reduction
- Partitioning
- Implement A-Close as well as Pincer Search. Look at 2 more algorithms for same and implement them. (Or use library if found) E.g. CHARM & MAFIA.
- Test DEAP package in python
- Implement a variant of the Decision Tree Classification algorithm which uses Simple Genetic Algorithm to prioritize the selection of paths to generate class label. You may redirect the tree output of a built in Decision Tree classifier as if then rules and then perform GA operation using an appropriate fitness measure.
- Test Drive the Problem in 1 using bucket brigade strategy of fitness apportionment.
- Test Drive the BPN classification algorithm for a large data set of your choice (use of built in support / user defined functions is fine).
- Test Drive the other variants of Neural Network classifiers supported in Python / R and analyse the results in comparsion to (3).
- Kaggle
- Random Generated Dataset