Solutions for the course ID2222 Data Mining at KTH. This course deals with Data Mining techniques for analysing large-scale datasets. For more information please refer to the course webpage. The homework solutions were mostly implemented in Python.
homework 1 - Similar Items
: Find similar documents using minhashing and LSH techniqueshomework 2 - Association Rules
: Find frequent itemsets and association rules using the Apriori algorithmhomework 3 - Data Streams
: Estimate triangle counts in a streaming graph of edge insertions using TRIESThomework 4 - Graph Spectra
: Implementation of the spectral graph clustering algorithm described in this paperhomework 5 - Graph Partitioning
: Implementation of the JABEJA algorithm for K-way graph partitioning in a distributed environment