Trimestre Data: Junho/2022 a Agosto/2022
- Repositório do aluno Juliano Tiago Rinaldi
- Atividades da disciplina de mestrado
- CAIA003-DAT003-DataMining, UTFPR Curitiba.
Professores (UTFPR, 2022):
- Heitor S. Lopes
- Thiago H. Silva
- Disciplina cursada como aluno externo
- Mestrando em Engenharia Elétrica
- Área de Inteligência Artificial
- Vínculo com a UTFPR - Pato Branco 2021/2022.
URL Moodle: https://moodle.dainf.ct.utfpr.edu.br/course/view.php?id=932
-
Motivations: The huge availability of data of all types stored everywhere (databases, data warehouses, bank transactions, e-commerce, web pages, social networks, etc.) fostered research in a specific area of Computer Science: Knowledge Discovery in Databases (KDD). One of the main steps of KDD is the application of algorithms for Data Mining (DM). DM's objective is to discover human-comprehensible patterns hidden in large amounts of data. DM & KDD are multidisciplinary, overlapping many areas such as Databases, Computational Intelligence, Machine Learning, Pattern Recognition, Information Science, and Statistics. Such methods are omnipresent, leading to a growing interest by researchers, companies, and the government.
-
Objective: This course aims at presenting the knowledge discovery process using data mining methods. The main data mining tasks are studied along with the most well-known algorithms, always focusing on real-world applications.
-
Evaluation: Students' grades will be computed according to three evaluations: weekly homework (10%), Challenge (40%), and final project (teams of two - 50%). Deliverables: homework require a short report with results; the final project requires a "paper-like" full report with motivations, objective, background, methods, experiments, and results. A seminar for the oral presentation of the projects will be set for Oct 25th/26th, 2022.
-
Textbook: P-N. Tan; M. Steinbach; V. Kumar. Introduction to Data Mining, 2nd edition. Pearson, 2018.
- june 7th - Introduction: the data mining & knowledge discovery process. Presentation of real-world case-studies. Data collection (webcrawling & webscrapping). Dataset construction.
- june, 14th - Types of data and their analysis. Data visualization. Recommended Tools.
- june 21st - Classification task: Decision trees. Models, concepts and evaluation metrics.
- june, 28th - Classification task: kNN, Neural Networks. Bagging and boosting. Regression: Linear regression
- july, 5th - Associative analysis task: frequent and infrequent pattern discovery
- aug, 9th - No Class
- aug, 16th - Clustering task: K-means, hierarchical clustering, cluster quality
- aug, 23rd - Feature selection, dimensionality reduction, Principal Components Analysis (PCA)
- aug, 30th - Text mining
- sep, 6th - Image Mining - Anomaly detection - Discussion about the Final Project
- sep, 13th - PROJECT PROPOSAL DUE: Video with short presentation and discussion of proposals for the final project. Including: objective, dataset construction, methods, and analysis. Proposals will be analyzed and approval or resubmission will be communicated by e-mail to the students
- oct, 18th - FINAL PROJECT REPORT DUE: Full report "paper-like" along with codes and data. CHALLENGE BRIEF REPORT DUE. Brief report explaining the approaches used.
- oct, 25th - VIDEO OF PRESENTATION: Video presenting results of the final project
URL Moodle: https://moodle.dainf.ct.utfpr.edu.br/mod/page/view.php?id=47244