100-Days-Of-ML-Code
By Harshit Ahluwalia
Siraj Raval
100 Days of Machine Learning Coding as proposed byMy Journey towards Machine Learning upto 2018
How to Learn Machine Learning
While i don't want to overstate the complexity of the field, 30 days is awfully short.
- Spend most of my time on the basics of statistics
- Then have a look at 1 or 2 very common techniques. (e.g.,
linear regression
andlogistic regression
) - Take a dataset that interests you and do some descriptive statistics on it (counts, max, min, median, plots etc) and discover as many weird things in the data as possible. Weird meaning stuff that does not seem right.
- Now try to answer a question for yourself on the above dataset. Do this by (A) solving the weird stuff, (B) getting the data into a format that works for (C) one of the common techniques you studied. It is okay if you hack the code together with lots of googling. (do sanity checks on your results though ;)
Basic Statistics
One of the most easy pitfalls is to just take off-the-shelf implementations of algorithms and throw them against your problem. But most algorithms are based on assumptions and all of them have some limitations. A good grasp of basic statistics will help you:
- Determine whether the assumptions hold.
- What they mean for your choice of algorithm
- Reason about the limitations they imply
- The impact if they are not present (which is not always dramatic)
- Any time spent here will pay dividends every time you have a look at a new algorithm. So no worries if this takes up nearly all your time.
Common Techniques
- Early on, you actually better go deep than broad, because many concepts/elements return any way in other algorithms.
- I mention two types of regressions because in many cases, you'll get a decent answer with these techniques. Also, it is in some sense amazing how something that is basically 'draw trendline' in Excel actually goes so deep. Not that all of it is taken that heavily into account in practice, but it still is good to have it in the back of your head. Especially for those times where you get weird results.
Weird Data Stuff
- This is the largest timesink, always. And it is very important, hence the mantra 'garbage in, garbage out'. Take any real-world dataset which has not been pre-cleaned and you'll find weird things:
- A hugely overrepresented value (companies who like to code missing as 999...)
Duplicate ID's - A variable which is actually an ID (amazing how many student dreams are shattered by pointing this one out if they have a nearly perfect model ;))
- Missing values
- Mislabeled cases, misspellings...
- Everything is on state level, except for this one state for which they are reporting counties instead.
- You need to experience it to acknowledge it. And almost any real-world dataset + a critical eye will make you do just that. ;)
Try It
-
Well, you didn't learn all this not to use it, right? Also, making sense of your results is important. And being critical for them as well. It is so easy to make a logical mistake which is not programming mistake. I.e., the software will run, but the result will be very wrong.
-
If you want to go all the way, take your results to a friend/family and try to explain high-level what you did, what the results are and what they mean. Again speaking from teaching experience, there are people who are really good at the technical stuff, but cannot transfer the relevant implications of it to a non-technical person.
So You want to Learn Machine Learning in 30 Days . you need to Devote About ML & work hard ,in Machine Learning There are Various Concepts are there .
Better to take any Online Course. then note down all course content and Prepare Schedule for 30 Days . i will Suggest you Best Online Machine Learning Course.
Machine Learning by Standford University, Mentor - Andrew Ng
Topics include:
(i) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks).
(ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning).
(iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI).
The course will also draw from numerous case studies and applications, so that you'll also learn how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas.
Machine Learning A-Z™: Hands-On Python & R In Data Science
This course is fun and exciting, but at the same time we dive deep into Machine Learning. It is structured the following way:
- Part 1 - Data Preprocessing
- Part 2 - Regression: Simple Linear Regression, Multiple Linear Regression, Polynomial Regression, SVR, Decision Tree Regression, Random Forest Regression
- Part 3 - Classification: Logistic Regression, K-NN, SVM, Kernel SVM, Naive Bayes, Decision Tree Classification, Random Forest Classification
- Part 4 - Clustering: K-Means, Hierarchical Clustering
- Part 5 - Association Rule Learning: Apriori, Eclat
- Part 6 - Reinforcement Learning: Upper Confidence Bound, Thompson Sampling
- Part 7 - Natural Language Processing: Bag-of-words model and algorithms for NLP
- Part 8 - Deep Learning: Artificial Neural Networks, Convolutional Neural Networks
- Part 9 - Dimensionality Reduction: PCA, LDA, Kernel PCA
- Part 10 - Model Selection & Boosting: k-fold Cross Validation, Parameter Tuning, Grid Search, XGBoost. Moreover, the course is packed with practical exercises which are based on live examples. So not only will you learn the theory, but you will also get some hands-on practice building your own models.
You can Download all the required books from here
Mastering Feature Engineering
R for Data Science
Python for Data Analysis
You can Download all the required books from here
The Element of Stastistics Learning
An Introduction to Stastistics Learning
Machine Learning with R