ml1 notebooks improvements
stas00 opened this issue · 1 comments
I'm not sure how to contribute to the course notebook improvements.
Here are a few corrections/improvements:
- courses/ml1/lesson1-rf.ipynb
replace:
*todo* define r^2
with:
In statistics, the coefficient of determination, denoted R2 or r2 and pronounced "R squared", is the proportion of the variance in the dependent variable that is predictable from the independent variable(s). https://en.wikipedia.org/wiki/Coefficient_of_determination
- before:
df_raw.UsageBand = df_raw.UsageBand.cat.codes
add explanation:
"Normally pandas will continue displaying the text categories, while treating them as numerical data internally. Optionally we can replace the text categories with numbers, which will make this variable non-categorical, like so:"
- courses/ml1/lesson2-rf_interpretation.ipynb
In "One-hot section encoding" there is no explanation of what it does. Here is my attempt to explain:
"Using proc_df's *max_n_cat* argument we can turn some categorical variables into new columns,
where MyCol with categories (small, medium, large) will turn into 3 new one-hot encoded columns
MyCol_small, MyCol_medium, MyCol_large (removing the original one).
It will only happen to columns whose number of categories is no more than max_n_cat.
Now we may have columns with more important features than they were earlier where all categories were in one column."
- courses/ml1/lesson3-rf_foundations.ipynb
a small fix here:
http://forums.fast.ai/t/another-treat-early-access-to-intro-to-machine-learning-videos/6826/615
Thanks.
If there is a better way to do it please let me know how (link?)
I'm currently trying to figure out the use of nbdime for notebook diff patches. I'll post more once the nbdime's developer has all the issues resolved.
moved to fastai/fastai#566