- Lecture Time: Tuesday/Thursday 10-11:50am
- classroom: WG Young CS24
- Office hours: Monday 2-3 and Tuesday 4:15-5:00 @ zoom
- Zongyue Qin (qinzongyue at cs.ucla.edu), office hours: Monday 9-11am @ BH 3551 (row M)
- Yewen Wang (wyw10804@gmail.com, please check Yewen's Email Policy before emailing her.), office hours: Wednesday 9-10am @ BH 3551 Conference Room, 10-11am @ zoom
- Shichang Zhang (myfirstname@cs.ucla.edu), office hours: Friday 10am-12pm @ BH 3551 Conference Room (email me if you can't find the place)
This course introduces basic concepts, algorithms, and techniques of data mining on different types of datasets, including (1) vector data, (2) set data, (3) sequence data, (4) text data, and (5) graph data. The class project involves hands-on practice of mining useful knowledge from large data sets. The course is an undergraduate-level computer science course. Also, the course may attract students from other disciplines who need to understand, develop, and use data mining techniques to analyze large amounts of data.
- You are expected to have background knowledge in data structures, algorithms, basic linear algebra, and basic statistics.
- You will also need to be familiar with at least one programming language, and have programming experiences.
- Know what data mining is and learn the basic algorithms
- Develop skills to apply data mining algorithms to solve real-world applications
- Gain initial experience in conducting research on data mining
- Homework: 30%
- Midterm exam: 20%
- Final exam: 15%
- Course project: 25%
- Participation: 10%
*All the deadlines are 11:59PM (midnight) of the due dates.
*Late submission policy: you will get original score * , if you are t hours late.
*No copying or sharing of homework!
- You can discuss general challenges and ideas with others.
- Suspicious cases will be reported to The Office of the Dean of Students.
- We will be using Piazza for class discussion. The system is highly catered to getting you help fast and efficiently from classmates, the TAs, and myself. Rather than emailing questions to the teaching staff, I encourage you to post your questions on Piazza.
- Sign up Piazza here: piazza.com/ucla/fall2021/cs145
- Tips: Answering other students' questions will increase your participation score.
"With its status as a world-class research institution, it is critical that the University uphold the highest standards of integrity both inside and outside the classroom. As a student and member of the UCLA community, you are expected to demonstrate integrity in all of your academic endeavors. Accordingly, when accusations of academic dishonesty occur, The Office of the Dean of Students is charged with investigating and adjudicating suspected violations. Academic dishonesty, includes, but is not limited to, cheating, fabrication, plagiarism, multiple submissions or facilitating academic misconduct." For more information, please refer to the guidance .
*Book refers to: Jiawei Han, Micheline Kamber, and Jian Pei, Data Mining: Concepts and Techniques, 3rd edition.
Week | Date | Topic | Further Reading | Discussion Session | Homework | Course Project |
---|---|---|---|---|---|---|
Week 0 | 9/23 | Introduction [Slides] and Know Your Data [Slides] |
|
Week0 Slides | ||
Week 1 | 9/28 | Linear Regression [Slides] | https://cs229.stanford.edu/notes2021fall/cs229-notes1.pdf | |||
Week 1 | 9/30 | Logistic Regression [Slides] | https://cs229.stanford.edu/notes2021fall/cs229-notes1.pdf | Week 1 Slides | HW1 Released | |
Week 2 | 10/5 | Tree-based Models [Slides] |
|
|||
Week 2 | 10/7 | Neural Networks [Slides] | Week 2 Slides | HW1 Due (10/6 11:59pm), HW2 Released | ||
Week 3 | 10/12 | Continue with Neural Networks | ||||
Week 3 | 10/14 | Practical Issues of Classification [Slides] and K-Means [Slides] |
|
Week 3 Slides | ||
Week 4 | 10/19 | Mixture Models [Slides] and Practical Issues of Clustering [Slides] | HW2 Due (10/18 11:59pm), HW3 Released | |||
Week 4 | 10/21 | Text Data: Naive Bayes [Slides] | http://www.ccs.neu.edu/home/yzsun/classes/2014Fall_CS6220/Slides/NB.pdf | Week 4 Slides | ||
Week 5 | 10/26 | Text Data: Topic Models [Slides] |
|
HW3 Due (10/25 11:59pm), HW4 Released | ||
Week 5 | 10/28 | Time Series Data [Slides] | https://online.stat.psu.edu/stat510 | Week 5 Slides | ||
Week 6 | 11/2 | Continue with Time Series | ||||
Week 6 | 11/4 | Midterm Exam | Week 6 Slides | HW4 Due | 11/7 Midterm Report Due | |
Week 7 | 11/9 | Set Data: Frequent Pattern Mining and Association Rules [Slides] | Book Chapter 6 | |||
Week 7 | 11/11 | Veterans Day holiday (No Class) | ||||
Week 8 | 11/16 | Set Data: Frequent Pattern Mining and Association Rules (same as above) | Book Chapter 6 | |||
Week 8 | 11/18 | Set Data: Frequent Pattern Mining and Association Rules (same as above) | Book Chapter 6 | HW5 Due (11/18 11:59pm), HW6 Released | ||
Week 9 | 11/23 | Sequence Data: Sequential Pattern Mining [Slides] | Book Chapter 8 | Week 8 Slides | ||
Week 9 | 11/25 | Thanksgiving holiday (No Class) | ||||
Week 10 | 11/30 | Graph Data: Random Walk [Slides], Classification and Clustering [Slides] | Week10 Slides | |||
Week 10 | 12/2 | Bias, Privacy, and Ethics [Slides] | 12/5 Kaggle Submission Stop | |||
Week 11 | 12/9 | Final Exam | 12/10 Final Report Due |