[Current meme winner: Tyler Mainguy @tylermainguy - Checkout the competition]
Originally created by Prof. David Skillicorn in 2000
Instructor: Dr. Steven Ding
Friday 10:00-11:15 11:45-13:00 EST Miner Party - Live Streaming [Zoom link on Onq]
✨ [JOIN our discord channel HERE]✨
✨ TAs: Weihan Ou (git: WeihanO) ✨
Study of the extraction of concepts from large high-dimensional datasets. Statistical foundations; techniques such as supervised neural networks, unsupervised neural networks, decision trees, association rules, Bayesian classifiers, inductive logic programming, genetic algorithms, singular value decomposition, hierarchical clustering.
Topics. Introduction to Data Mining. Overview of data mining, functions in data mining, the epistemology of data science, experimental protocol, performance assessment, ethical & privacy implications, human bias in data mining, introduction to Python for data science.
- Tabular and Relational Data Mining. Experiment design, data preprocessing, data visualization, data exploration, linear models, performance evaluation, overfitting, bias-variance analysis, error estimation, tree-based methods, ensemble methods, instance-based learning, graph-based models, introduction to neural network & autograph, model tuning workflow, clustering.
- Time Series, Transactional, and Sequential Data Mining. Time-series forecasting, association rule mining, sequential data mining, text data mining, recurrent neural networks, attention mechanisms, transformers, explainability, interpretability, topic embedding, representation learning.
- Graph and Web Data Mining. Social network analysis, heterogeneous information network, graph-based representation learning, link prediction, graph alignment, meta-path analysis, recommendation systems, recursive neural network, graph neural network, multi-task learning.
Textbooks and references:
- [Optional] Data Mining: Concepts and Techniques, 3rd Edition, by Jiawei Han, Micheline Kamber, and Jian Pei, Morgan Kaufmann, 2012. ISBN 978-0-12-381479-1.
- [Optional] Zaki, M. J., & Meira, W. (2014). Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press.
- [Optional] VanderPlas, J. (2016). Python data science handbook: Essential tools for working with data. " O'Reilly Media, Inc.".
- [Required] PowerPoint slides, coding templates, and other online resources provided by the instructor.
There is an issues
tab right up there ☝️. Cannot find it? Click HERE. To ask a question, follow two steps:
- Frist, search for related issue using the
Filters
box. - If no one asked before, create an issue with a title that summazie your whole question. Then fill in the body of your question. You can copy screenshot and parse into the text box as well.
(!!!Zoom Meeting link in the above shared sheet!!!)
[Timelines are Tentative, please let me if you have any conflicts!]
Assignments (Option #1 100% total) | Due Date | Individual Project (Option #2 100% total) | Due Date |
---|---|---|---|
Assignment #0 | 10-07 | Project Proposal | 10-02 |
Assignment #1 | 10-17 | Phase 1 source code and results | 10-23 |
Assignment #2 | 11-06 | Phase 2 source code and results | 11-13 |
Assignment #3 | 11-15 | Phase 3 source code and results | 11-27 |
Assignment #4 | on Kaggle | Final report | 12-18 |
Assignment #5 | on Kaggle |
Week | Sildes | Videos |
---|---|---|
Week #1 Introduction | Slides | L1-1 L1-2 |
Week #2 Regression | Slides | L2-1 L2-2 |
Week #3 Classification | Slides | TBA |
TBA | TAB | TBA |
Week #9 | Slides | L9 |
Week #10 | Slides | L10 |
Week #11 | Slides | L11 |
Week #12 | L12 | |
Week #13 | L13 |
credits: AmirHossein Sojoodi @amirsojoodi
Thanks goes to these wonderful people (emoji key):
Steven Ding 💻 📖 |
WeihanO 💻 📖 |
Li Tao Li 💻 📖 |
Gitsamshi 💻 📖 |
Sal Choueib 💻 📖 |
strixy16 💻 📖 |
This project follows the all-contributors specification. Contributions of any kind welcome!