With the rise in high-throughput sequencing technologies, the volume of omics data has grown exponentially. A major issue is to mine useful knowledge from these heterogeneous collections of data. The analysis of complex high-volume data is not trivial and classical tools cannot be used to explore their full potential. Machine Learning (ML), a discipline in which computers perform automated learning without being programmed explicitly and assist humans to make sense of large and complex data sets, can thus be very useful in mining large omics datasets to uncover new insights that can advance the field of bioinformatics.
This hands-on workshop will introduce participants to the ML taxonomy and the applications of common ML algorithms to health data. The workshop will cover the foundational concepts and common methods being used to analyse omics data sets by providing a practical context through the use of basic but widely used R libraries. Participants will acquire an understanding of the standard ML processes, as well as the practical skills in applying them on familiar problems and publicly available real-world data sets.
By the end of the workshop you should be able to:
- Understand key ML concepts, common algorithms and terminology
- Understand the importance of ML in analysing complex, high-volume health-related data
- Use R packages to implement an ML workflow on real-world dataset, from data preparation to model application and evaluation
Dr Fotis Psomopoulos, Senior Researcher
Institute of Applied Biosciences (INAB)
Center for Research and Technology, Hellas (CERTH)
The workshop will be run as a series of code-along sessions, with some additional activities for participants to complete throughout the sessions. All participants will stay in the main room, unless they are experiencing technical difficulties and require 1:1 support from a trainer.
Date/Time: Monday 9 December 2024, 1 - 5 pm AEDT / 12 - 4 pm AEST / 12:30 - 4:30 pm ACDT / 10 am - 2 pm AWST
Location: Online
This workshop is for Australian researchers who are or will apply ML to the analysis of omics data as part of their projects. It is suitable for beginners in ML. You must be associated with an Australian organisation for your application to be considered.
No previous knowledge of ML is required or expected (please note, that this will be an introductory course to ML) Familiarity with the R programming language. If you need a refresher on R/RStudio try the Introduction to R and RStudio section of this online tutorial
This workshop is free but participation is subject to application with selection.
Applications close at 11:59pm AEST, 24 November 2024.
Applications are reviewed by the organising committee and all applicants will be informed of the status of their application (successful, waiting list, unsuccessful). Successful applicants will be provided with a Zoom meeting link closer to the date. More information on the selection process is provided in our Advice on applying for Australian BioCommons workshops.
If you finish all the exercises and wish to practice on more examples, here are a couple of good examples to help you get more familiar with the different ML techniques and packages.
- RNASeq Analysis in R
- Use the Iris R built-in data set to run clustering and also some supervised classification and compare results obtained by different methods.
The material in the workshop has been based on the following resources:
- ELIXIR CODATA Advanced Bioinformatics Workshop
- Machine Learning in R, by Hugo Bowne-Anderson and Jorge Perez de Acha Chavez
- Practical Machine Learning in R, by Kyriakos Chatzidimitriou, Themistoklis Diamantopoulos, Michail Papamichail, and Andreas Symeonidis.
- Linear models in R, by the Monash Bioinformatics Platform
- Relevant blog posts from the R-Bloggers website.
- Predicting the breast cancer by characteristics of the cell nuclei present in the image
Relevant literature includes:
- Pattern Recognition and Machine Learning by Christopher M. Bishop.
- Machine learning in bioinformatics, by Pedro Larrañaga et al.
- Ten quick tips for machine learning in computational biology, by Davide Chicco
- Statistics versus machine learning
- Machine learning and systems genomics approaches for multi-omics data
- A review on machine learning principles for multi-view biological data integration
- Generalized Linear Model
This material is made available under the Creative Commons Attribution 4.0 International license. Please see LICENSE for more details.
Wandrille Duchemin, Crhistian Cardona, Pedro L. Fernandes, & Fotis E. Psomopoulos. (2021). Introduction to Machine Learning (v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.5752486
Additionnaly, we would like to acknowledge that this training materials draws heavily from :
Shakuntala Baichoo, Wandrille Duchemin, Geert van Geest, Thuong Van Du Tran, Fotis E. Psomopoulos, & Monique Zahn. (2020, July 23). Introduction to Machine Learning (Version v1.0.0). Zenodo. http://doi.org/10.5281/zenodo.3958880