Analytics-and-Systems-of-Big-Data

Analytics and Systems of Big Data Lab Work

Batch No : 08

Practice Set 01

Use Python / R library for
- Apriori (ARM) : by testing it for atleast 5 measures of pattern evaluation / interestingness other than Support and Confidence
  - Selecting the right interestingness measure for association patterns
  - Lift
  - Conviction
  - Leverage
  - Collective Strength
  - Added Value
- Bayes or Decision Tree (Classifier) : All measures of classifier accuracy
  - F1 Score
  - Specificity
  - Sensitivity
  - Recall
  - Precision
  - AUC (Area Under Curve)
- K-Means (Clustering) : atleast 3 parameters of cluster quality
  - Radius
  - Clustering
  - Parity of Clusters
Explore all FIM (Frequent Itemset Mining) library support in Python / R : Atleast 5 algorithms other than Apriori
Implement DIC (Dynamic Itemset Counting) in Python / R
Implement efficient version of K-Means / Hierarchical (Dendrogram)
- Clue : Min Heap data structure
Implement any one ARBC (Association Rule Based Classifiers) algorithm
Explore all information evaluation measures of Decision Tree (atleast 3)
- Shanon's Entropy Theorem (Information Gain)
Explore data preprocessing support in Python / R (atleast 5)
- Data Preprocessing Techniques for Data Mining - IASRI
- Data Smoothing
- Data Binning
Explore Python / R library support for ECLAT (Equivalence CLAss Transformation)

Try out all efficient variants of Apriori
- Hashing
- Transaction Reduction
- Partitioning
Implement A-Close as well as Pincer Search. Look at 2 more algorithms for same and implement them. (Or use library if found) E.g. CHARM & MAFIA.
Test DEAP package in python

Implement a variant of the Decision Tree Classification algorithm which uses Simple Genetic Algorithm to prioritize the selection of paths to generate class label. You may redirect the tree output of a built in Decision Tree classifier as if then rules and then perform GA operation using an appropriate fitness measure.
Test Drive the Problem in 1 using bucket brigade strategy of fitness apportionment.
Test Drive the BPN classification algorithm for a large data set of your choice (use of built in support / user defined functions is fine).
Test Drive the other variants of Neural Network classifiers supported in Python / R and analyse the results in comparsion to (3).