/CAIA003-DAT003-DataMining

CAIA003-DAT003-DataMining

Primary LanguageJupyter NotebookCreative Commons Zero v1.0 UniversalCC0-1.0

CAIA003-DAT003-DataMining


Trimestre Data: Junho/2022 a Agosto/2022


  • Repositório do aluno Juliano Tiago Rinaldi
  • Atividades da disciplina de mestrado
  • CAIA003-DAT003-DataMining, UTFPR Curitiba.

Professores (UTFPR, 2022):

  • Heitor S. Lopes
  • Thiago H. Silva

  • Disciplina cursada como aluno externo
  • Mestrando em Engenharia Elétrica
    • Área de Inteligência Artificial
  • Vínculo com a UTFPR - Pato Branco 2021/2022.

URL Moodle: https://moodle.dainf.ct.utfpr.edu.br/course/view.php?id=932

Course information

  • Motivations: The huge availability of data of all types stored everywhere (databases, data warehouses, bank transactions, e-commerce, web pages, social networks, etc.) fostered research in a specific area of Computer Science: Knowledge Discovery in Databases (KDD). One of the main steps of KDD is the application of algorithms for Data Mining (DM). DM's objective is to discover human-comprehensible patterns hidden in large amounts of data. DM & KDD are multidisciplinary, overlapping many areas such as Databases, Computational Intelligence, Machine Learning, Pattern Recognition, Information Science, and Statistics. Such methods are omnipresent, leading to a growing interest by researchers, companies, and the government.

  • Objective: This course aims at presenting the knowledge discovery process using data mining methods. The main data mining tasks are studied along with the most well-known algorithms, always focusing on real-world applications.

  • Evaluation: Students' grades will be computed according to three evaluations: weekly homework (10%), Challenge (40%), and final project (teams of two - 50%). Deliverables: homework require a short report with results; the final project requires a "paper-like" full report with motivations, objective, background, methods, experiments, and results. A seminar for the oral presentation of the projects will be set for Oct 25th/26th, 2022.

  • Textbook: P-N. Tan; M. Steinbach; V. Kumar. Introduction to Data Mining, 2nd edition. Pearson, 2018.


Schedule overview (2022)

  • june 7th - Introduction: the data mining & knowledge discovery process. Presentation of real-world case-studies. Data collection (webcrawling & webscrapping). Dataset construction.
  • june, 14th - Types of data and their analysis. Data visualization. Recommended Tools.
  • june 21st - Classification task: Decision trees. Models, concepts and evaluation metrics.
  • june, 28th - Classification task: kNN, Neural Networks. Bagging and boosting. Regression: Linear regression
  • july, 5th - Associative analysis task: frequent and infrequent pattern discovery
  • aug, 9th - No Class
  • aug, 16th - Clustering task: K-means, hierarchical clustering, cluster quality
  • aug, 23rd - Feature selection, dimensionality reduction, Principal Components Analysis (PCA)
  • aug, 30th - Text mining
  • sep, 6th - Image Mining - Anomaly detection - Discussion about the Final Project
  • sep, 13th - PROJECT PROPOSAL DUE: Video with short presentation and discussion of proposals for the final project. Including: objective, dataset construction, methods, and analysis. Proposals will be analyzed and approval or resubmission will be communicated by e-mail to the students
  • oct, 18th - FINAL PROJECT REPORT DUE: Full report "paper-like" along with codes and data. CHALLENGE BRIEF REPORT DUE. Brief report explaining the approaches used.
  • oct, 25th - VIDEO OF PRESENTATION: Video presenting results of the final project

Final project

URL Moodle: https://moodle.dainf.ct.utfpr.edu.br/mod/page/view.php?id=47244


Challenge


Atividades

Atividade1


Atividade3


Atividade4


Atividade5