🔥 CISC873 Data Mining 🔥

[Current meme winner: Tyler Mainguy @tylermainguy - Checkout the competition]

Originally created by Prof. David Skillicorn in 2000

Instructor: Dr. Steven Ding

Friday 10:00-11:15 11:45-13:00 EST Miner Party - Live Streaming [Zoom link on Onq]

✨ [JOIN our discord channel HERE]✨

✨ TAs: Weihan Ou (git: WeihanO)  ✨

Course Intro Video

Study of the extraction of concepts from large high-dimensional datasets. Statistical foundations; techniques such as supervised neural networks, unsupervised neural networks, decision trees, association rules, Bayesian classifiers, inductive logic programming, genetic algorithms, singular value decomposition, hierarchical clustering.

Topics. Introduction to Data Mining. Overview of data mining, functions in data mining, the epistemology of data science, experimental protocol, performance assessment, ethical & privacy implications, human bias in data mining, introduction to Python for data science.

  • Tabular and Relational Data Mining. Experiment design, data preprocessing, data visualization, data exploration, linear models, performance evaluation, overfitting, bias-variance analysis, error estimation, tree-based methods, ensemble methods, instance-based learning, graph-based models, introduction to neural network & autograph, model tuning workflow, clustering.
  • Time Series, Transactional, and Sequential Data Mining. Time-series forecasting, association rule mining, sequential data mining, text data mining, recurrent neural networks, attention mechanisms, transformers, explainability, interpretability, topic embedding, representation learning.
  • Graph and Web Data Mining. Social network analysis, heterogeneous information network, graph-based representation learning, link prediction, graph alignment, meta-path analysis, recommendation systems, recursive neural network, graph neural network, multi-task learning.

Textbooks and references:

  • [Optional] Data Mining: Concepts and Techniques, 3rd Edition, by Jiawei Han, Micheline Kamber, and Jian Pei, Morgan Kaufmann, 2012. ISBN 978-0-12-381479-1.
  • [Optional] Zaki, M. J., & Meira, W. (2014). Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press.
  • [Optional] VanderPlas, J. (2016). Python data science handbook: Essential tools for working with data. " O'Reilly Media, Inc.".
  • [Required] PowerPoint slides, coding templates, and other online resources provided by the instructor.

Where to Ask a Question? ❓

There is an issues tab right up there ☝️. Cannot find it? Click HERE. To ask a question, follow two steps:

  • Frist, search for related issue using the Filters box.
  • If no one asked before, create an issue with a title that summazie your whole question. Then fill in the body of your question. You can copy screenshot and parse into the text box as well.

Assignments 🔨

Project Meeting Sign-up

(!!!Zoom Meeting link in the above shared sheet!!!)

[Timelines are Tentative, please let me if you have any conflicts!]

Assignments (Option #1 100% total) Due Date Individual Project (Option #2 100% total) Due Date
Assignment #0 10-07 Project Proposal 10-02
Assignment #1 10-17 Phase 1 source code and results 10-23
Assignment #2 11-06 Phase 2 source code and results 11-13
Assignment #3 11-15 Phase 3 source code and results 11-27
Assignment #4 on Kaggle Final report 12-18
Assignment #5 on Kaggle

Course Materials. 📚

Lecture Recordings & Notes 💪

Week Sildes Videos
Week #1 Introduction Slides L1-1 L1-2
Week #2 Regression Slides L2-1 L2-2
Week #3 Classification Slides TBA
TBA TAB TBA
Week #9 Slides L9
Week #10 Slides L10
Week #11 Slides L11
Week #12 L12
Week #13 L13

How to Learn Data Mining

credits: AmirHossein Sojoodi @amirsojoodi

Contributors ✨

All Contributors

Thanks goes to these wonderful people (emoji key):


Steven Ding

💻 📖

WeihanO

💻 📖

Li Tao Li

💻 📖

Gitsamshi

💻 📖

Sal Choueib

💻 📖

strixy16

💻 📖

This project follows the all-contributors specification. Contributions of any kind welcome!