An odyssey of math, stats, ds, and programming.
- Spark SQL basic
- Spark SQL tips
- Python basic
- Python tips
- Python DNA (numpy, pandas, & plt)
- R DNA (data table & ggplot2)
- Algo master portal
- All DS cheat sheets repo
- statthinking21 from Stanford
- Linear algebra reference - Interactive Linear Algebra
- Python (technical + pd & np)
- technical: array & string, linked list, stack & queue, tree & graph, sorting & searching, recusion & DP ... These are enough for now
- data manipulation: get comfortable with pandas and numpy (e.g. how to choose all rows that contain a null value for any column, inter-quantile range from a list, etc...) and some easy questions on Leetcode on Lists, Strings, Matrix and Math.
- SQL
- Leetcode all and should be fine. be comfortable with window functions and self-join problems.
- Probability & Stats
- Probability: good to know a little especially different distributions, conditional probability but not top priority (probability)
- Stats: Know about A/B test, T test, power test and other hypothesis testings. How to design an experiment and what are the key decisions. This is quite popular. For example, you might be asked to design an experiment: talk about how you'd split observational units into control/treatment, the metrics that you'll track, the effects that you might observe, what could go wrong and finally some questions about how you'd choose the sample size and for how long you'd run an experiment.
- statthinking21 from Stanford
- PM
- Google some product management questions related to metrics design, think of metrics applicable to the company you apply for, how you'd measure those metrics, are they long-term or short-term? If long-term, how would you track them in a short time? could you come up with short-term proxy metrics that will reflect the long term metric you're interested in? I recommend getting the product questions book from "datamasked" - the bundle is expensive but if you email the author may be he'll sell you the book alone. It really helped in my case.
- ML algos
- basic algos and underlying math. Logistic Regression, Linear Regression and Random Forests. And popular metrics to use and when to use them. Imbalanced data: How to deal with it and what decisions to make?