100DaysofML

Inspired by #100DaysofCode and #100DaysofMLCode, starting mine as well, on 9 July 2020. Also with some resources I find useful.

Day 1 (9 July 2020) - continuing on Google's Machine Learning Crash Course on ML Engineering here, also video on Rules of Machine Learning, continuing on Coursera Biology Meet Programming - it's evident that my Python fundamental is not strong enough

Day 2 (10 July 2020) - Google's Machine Learning Crash Course Data Dependencies, fairness; started on Khan Academy Linear Algebra

Day 3 (11 July 2020) - Completing Fairness section of ML Engineering of Google's ML Crash Course, read on SVD, an easy to understand article on Medium [You Don’t Know SVD Singular Value Decomposition

Day 4 (13 July 2020) - Completed Google's ML Crash Course notes here, started on edX Stanford's Online, Databases & SQL by Jennifer Widom

Day 5 (14 July 2020) - Started next Google AI course - Data preparation and feature engineering my notes here, continued on EdX Stanford Online - Database & SQL (Querying relational database)

Day 6 (15 July 2020) - Continued on sampling and splitting data in ML data preparation course (notes), continued on Jennifer Widom database course YouTube series on well formed XML

Day 7 (16 July 2020) - Learned about Randomization (Make your data generation pipeline reproducible), data transformation; XSD (XML Schema Definition) basics, a bit on JSON data

Day 8 (17 July 2020) - Learned about transforming numeric data by normalization or bucketing; JSON data, JSON schema, syntactically and semantically valid JSON

Day 9 (20 July 2020) - Have been learning a bit previous two days, but didn't commit full one hour to it. Today's on database : relational algebra - natural join, theta join, union, difference, intersection operator. Google ML course data prep - Intro to modeling. Honestly don't even know the optimizer choice before, guess I need to have more background reading.

Day 10 (21 July 2020) - browsing around on scikit learn, seemed to be stuck on last part of Intro to modelling section, not sure can I even counted it as learning, cause I'm doing nothing.

Day 11 (22 July 2020) - finally read up on the review article of using deep learning in bioinformatics. Glad that I manage to understand some of those terms (saw it before in Google ML Crash Course). Machine learning is literally anywhere now, regardless of what field you are in. As long as you generate data, machine learning can have its foot in it.

Day 12 (24 July 2020) - Skipping so many days, honestly don't know what am I doing. Today just do a quick read up on Intro to Machine Learning in Bioinformatics. There are still a lot that I don't understand, especially on the R-code and package used to run the analysis. Continuing on Stanford database course - relational algebra done, moving back to SQL

Day 13 (25 July 2020) - Devoted 3 hours on ML today (can I counted it towards the following 2 days hah). Completed Google ML Data Prep and Feature Engineering although I need to admit that I still don't understand the last programming exercise part (Intro to modelling) - have been stuck there for many days and I need to move on. Read up a bit on data reshaping on R. Continuing EdX SQL query (SELECT, WHERE, FROM). Read up a bit on Machine Learning in R (most resources I used now focused mostly on using Python).

Day 14 (27 July 2020) - Mostly focused on learning SQL from Stanford database course in EdX (I know it's not machine learning though). Got the general feel that the syntax in SQL querying is very similar to dplyr package in R. Thinking of starting the famous Machine Learning course on Coursera.

Day 15 (28 July 2020) - Dwelling and pacing back and forth between different resources Introduction to Data Science - rafalab, Intro to Machine Learning in Bioinformatics, An Introduction to Machine Learning in R, Hands-On Machine Learning with R and Statistical Modeling by Daniel Kaplan, all these are in bookdown format - the problem you have when they are too many resources on the internet and seems equally good for the subject at hand.

Day 16 (29 July 2020) - Continuing on Statistical Modeling (0.5 h). Trying on sentiment analysis in R based on this tutorial on trump tweets. Realised that I am still very weak in ggplot although been reading on it quite some time.

Day 17 (31 July 2020) - Did Statistical Modeling, but only 0.5 h yesterday, so couldn't count it as a day. Today started on Google's ML Clustering Course for almost 1 hour, continuing the rest for Statistical modeling to finish off one unit.

Day 18 (1 August 2020) - Continuing with Google's ML clustering (0.5 h) & EdX SQL database course (0.5 h), a relatively unproductive day it seems.

Day 19 (2 August 2020) - Continuing with Google ML Clustering (0.7 h), starting to get confusing again, and a lot of unknown or forgotten glossary (autoencoder, cress entropy, log loss). Tried EdX SQL assignment, couldn't get through even one, sigh. Not in condition today, though.

Day 20 (4 August 2020) - Skipping a day again. Today whole hour spent on Google ML's Clustering - Similarity Measure: manual vs supervise, with k-means clustering as example. Euclidean (distance only), cosine and dot product. Couldn't understand the Colab programming exercise. Realised I get stuck at the exercise almost every single time. And, I need a project to work on.

Day 21 (5 August 2020) - k-means generalization on Google ML Clustering course. Note: Perform k-means generalisation with varying cluster widths and dimensions when dealing with data of varying size and density. Finally manage to solve two more SQL question on EdX -- after re-watch again the lecture video. We really only learn when we did things by ourselves, not just passively listening to lecture.

Day 22 (6 August 2020) - Tried fast.ai course: A Code-First Introduction to Natural Language Processing · fast.ai. Decided to skip Google ML Clustering course for now since I really don't get it at this point. Continuing on Decision Tree on ML bioinfo intro. Continued a bit on Statistical Modeling - Describing variation, shapes of distribution.

Day 23 (9 August 2020) - Didn't spend up to one hour on ML (10 minutes short), but decided to clock in as well. Completed Chapter 4 on Statistical Modeling. Went 25 minutes of first lecture of fast.ai course: A Code-First Introduction to Natural Language Processing · fast.ai. Couldn't engaged with it cause it was designed to be a very classroom-based course.

Day 24 (10 August 2020) - Completed Chapter 5 on Statistical Modeling. Starting to get bored. Fiddling around in analytics vidya site, thinking about registering some courses. Wanted to start tutorial on scikit-learn. Still short of one hour today. Promise I will replace it some day.

Day 25 (11 August 2020) - Just half an hour today, still decided to log in today though. Mostly on Coursera's Machine Learning course by Andrew Ng from Stanford University. Not getting enough time. Probably with stop for a while.

Day 26 (21 January 2021) - It has been a while. Could not get the hang of linear algebra in Machine learning context. End up discovering this very useful and clear article on it! Linear Algebra explained in the context of deep learning | by laxman vijay | Towards Data Science

Day 27 (22 January 2021) - Try to read the Mathematics for Machine Learning (mml) book, but found it too technical and mathematic-based, which is extremely hard for me (without undergraduate maths background) to follow. Watching videos from Khan Academy helps a bit. Return to try the fastai course again Practical Deep Learning for Coders, and find the top-down approach appealing.