cruX Machine Learning Workshop 2023

By cruX: The Programming and Computing Club of BITS Pilani, Hyderabad

Introduction

Hello, I'm Arunachala from the 21 batch and a member of the machine learning division of cruX. I'd have liked to hold at least the introductory session offline, but that would've either meant letting it clash with your compres or holding it after your vacations. I don't need to explain why I didn't take the former option, and I don't want to waste the holidays when you're free and enthusiastic to learn something new. I am not an expert in the domain, but I have been learning it for quite some time now and will be able to help you learn from a student's perspective.

Before I outline how this whole thing works, I want to let you know that if you, at any point, have any questions related to ML or even in general outside this workshop, feel free to contact me.

Discord server invite link

Outline & Prerequisites

Overview

On a high level, the workshop will have a weekly/bi-weekly to-do list followed by a small(ish) competition regarding what you learned in that period. You'll start by learning basic ML concepts like regression, classification, and clustering, later move on to neural network architectures, and (hopefully) by the end of this workshop, you should be comfortable learning newer techniques from modern-day advancements. In fact, since in the survey, a good chunk of you (74%) voted for NLP, we'll be going over the architecture of 'Transformers', which is a very modern and powerful language model, which is currently powering the likes of OpenAI's ChatGPT and Bing's 'Sydney'.

I understand that the people coming for the workshop are aiming for different things, and I hope the workshop's structure allows you to experiment at your pace. If you're learning ML as an additional skill over something like web dev, and you're worried since this is clashing with that workshop (also online), it will make you miss out on a few things, don't. Most likely, if you fall in that category, you aren't entirely interested in learning all the math behind everything and just want to expand your knowledge in this domain. The mathy sections are marked and are optional if all you want to know is the implementation of models. Working along those lines should give you enough time for one more domain (though I advise you not to juggle more than that XD).

Breaking Down The Structure

The todo list:

Every week will have a given set of topics to be covered in the given period/set target. I'll link articles, youtube videos, playlists, etc., relevant to the said topics (intuition and application) so you can learn at your own pace and time. Additionally, if required, I'll be providing code files/snippets either through GitHub gists if it has little direct context to the workshop or in the main GitHub repository if it does. If the content isn't much, you'll be expected to learn everything attached/linked within 5 to 6 days, so you have time to give the weekend's competition a shot. If it's a little heavier or if there are other events during that period, it can go up to 2 weeks tops. As mentioned before, the math-related sections will be marked and optional for learning how to implement machine learning algorithms initially. But if you see yourself getting increasingly interested in machine learning, the content attached to the list should be a good enough place to start.

Weekly competitions:

The competitions will start as simple ML problem statements so you can familiarize yourself with the libraries and fundamental ML concepts. It will be hosted on Kaggle, where you can see the results of your submissions live (i.e., a live leaderboard). After the competition deadline is done, for the sake of documentation, to incentivize good coding practices and to help people who are learning later, you will need to create a PR with the code you wrote in the same repository as linked above. So the basic outline of the process will be as follows:

Join the contest on Kaggle and understand the problem statement.
Solve the question and submit the CSV file in a predefined format (it will be sent to you before the competition).
Format and comment on your code (everyone can see and judge your code, so doing this is up to you :) ).
Create a PR in the repository with your code.
A shared leaderboard will be visible in the same repo.

The exact details of how you will do this will be explained during your first contest/submission, and I'll be able to clear all your doubts simultaneously. This weekly gamified version of the workshop is an experiment that will hopefully incentivize you to learn faster. Still, if most of you feel it's getting a bit too stressful at some point, we can reduce the frequency of these contests/slow down the workshop in general.

The discord server:

To improve accountability among yourselves and for easier future communication, all announcements will be made on the discord server created for this. Though I will be updating this doc with everything I put on discord (for future reference if anyone needs it), it'll be much more organized there, so I recommend you join it even if you are using this resource later. The only thing I request from you guys is that you keep it active and be enthusiastic in the discussions. This will keep both of us motivated throughout the workshop and reduce my burden if you help each other out in the chats (#discussion channel).

Topics to be covered in the workshop (tentative, but mostly fixed):

Fundamental ML concepts and classification basics
Intermediate classification and working with hyperparameters
Regression and boosting algorithms
Unsupervised ML algorithms
Introduction to Neural Networks (ANNs and how they work)
Convolution Neural Networks (CNNs) and their applications
Recurrent Neural Networks (RNNs) and their applications
Understanding some new architectures relevant to modern developments (Transformers, Diffusers, etc.)
Transformers Deep Dive

Aims of this workshop:

While you won't be able to call yourself an ML expert by the end of this workshop with just its content, I'm sure if you follow along, you will at least be confident enough to pick up, understand and implement new ML algos whenever the need arises.
Get familiar with Kaggle and ML competitions so you can participate in other college fests/events, etc.
Get a common resource people can refer to later if they are interested in ML.
Hopefully be able to start at least a small community of people interested in ML, encouraging discussions, doubt solving, etc.

Prerequisites

As mentioned in the previous doc, the prerequisites for this workshop are proficiency in Python, NumPy, Pandas, and any data visualization library of your choice. While the last one isn't essential, and you can learn it on the go, at least try to be comfortable with the first three before the workshop starts, so that will be different from the reason we need to slow down. It'll also be helpful if you familiarize yourself with Google Colab notebooks/Kaggle notebooks so people can try your code out quickly without any virtual env bt. They also give you access to free GPUs for a certain amount of time daily.

kira1433/crux-ml-workshop