2024-11-ml-biocommons-au: A repository from npechl

Introduction to Machine Learning in R - from data to knowledge

With the rise in high-throughput sequencing technologies, the volume of omics data has grown exponentially. A major issue is to mine useful knowledge from these heterogeneous collections of data. The analysis of complex high-volume data is not trivial and classical tools cannot be used to explore their full potential. Machine Learning (ML), a discipline in which computers perform automated learning without being programmed explicitly and assist humans to make sense of large and complex data sets, can thus be very useful in mining large omics datasets to uncover new insights that can advance the field of bioinformatics.

This hands-on workshop will introduce participants to the ML taxonomy and the applications of common ML algorithms to health data. The workshop will cover the foundational concepts and common methods being used to analyse omics data sets by providing a practical context through the use of basic but widely used R libraries. Participants will acquire an understanding of the standard ML processes, as well as the practical skills in applying them on familiar problems and publicly available real-world data sets.

Learning outcomes

By the end of the workshop you should be able to:

Understand key ML concepts, common algorithms and terminology
Understand the importance of ML in analysing complex, high-volume health-related data
Use R packages to implement an ML workflow on real-world dataset, from data preparation to model application and evaluation

Lead Trainer

Dr Fotis Psomopoulos, Senior Researcher

Institute of Applied Biosciences (INAB)
Center for Research and Technology, Hellas (CERTH)

Workshop Format

The workshop will be run as a series of code-along sessions, with some additional activities for participants to complete throughout the sessions. All participants will stay in the main room, unless they are experiencing technical difficulties and require 1:1 support from a trainer.

Date/Time: Monday 9 December 2024, 1 - 5 pm AEDT / 12 - 4 pm AEST / 12:30 - 4:30 pm ACDT / 10 am - 2 pm AWST

Location: Online

Who the workshop is for

This workshop is for Australian researchers who are or will apply ML to the analysis of omics data as part of their projects. It is suitable for beginners in ML. You must be associated with an Australian organisation for your application to be considered.

Prerequisites

No previous knowledge of ML is required or expected (please note, that this will be an introductory course to ML) Familiarity with the R programming language. If you need a refresher on R/RStudio try the Introduction to R and RStudio section of this online tutorial

How to apply

This workshop is free but participation is subject to application with selection.

Applications close at 11:59pm AEST, 24 November 2024.

Applications are reviewed by the organising committee and all applicants will be informed of the status of their application (successful, waiting list, unsuccessful). Successful applicants will be provided with a Zoom meeting link closer to the date. More information on the selection process is provided in our Advice on applying for Australian BioCommons workshops.

Apply here.

Other examples

If you finish all the exercises and wish to practice on more examples, here are a couple of good examples to help you get more familiar with the different ML techniques and packages.

RNASeq Analysis in R
Use the Iris R built-in data set to run clustering and also some supervised classification and compare results obtained by different methods.

Sources / References

The material in the workshop has been based on the following resources:

ELIXIR CODATA Advanced Bioinformatics Workshop
Machine Learning in R, by Hugo Bowne-Anderson and Jorge Perez de Acha Chavez
Practical Machine Learning in R, by Kyriakos Chatzidimitriou, Themistoklis Diamantopoulos, Michail Papamichail, and Andreas Symeonidis.
Linear models in R, by the Monash Bioinformatics Platform
Relevant blog posts from the R-Bloggers website.
Predicting the breast cancer by characteristics of the cell nuclei present in the image

Relevant literature includes:

Pattern Recognition and Machine Learning by Christopher M. Bishop.
Machine learning in bioinformatics, by Pedro Larrañaga et al.
Ten quick tips for machine learning in computational biology, by Davide Chicco
Statistics versus machine learning
Machine learning and systems genomics approaches for multi-omics data
A review on machine learning principles for multi-view biological data integration
Generalized Linear Model

License

This material is made available under the Creative Commons Attribution 4.0 International license. Please see LICENSE for more details.

Citation

Wandrille Duchemin, Crhistian Cardona, Pedro L. Fernandes, & Fotis E. Psomopoulos. (2021). Introduction to Machine Learning (v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.5752486

Additionnaly, we would like to acknowledge that this training materials draws heavily from :

Shakuntala Baichoo, Wandrille Duchemin, Geert van Geest, Thuong Van Du Tran, Fotis E. Psomopoulos, & Monique Zahn. (2020, July 23). Introduction to Machine Learning (Version v1.0.0). Zenodo. http://doi.org/10.5281/zenodo.3958880

npechl/2024-11-ml-biocommons-au