/ml-tech-interview

An example of a machine learning tech interview as practice.

Primary LanguageJupyter Notebook

Machine Learning Tech Interview

This repo contains a mock technical interview. Answers to ML-related theoretical questions and a regression model based on the housing dataset can be found on the main script.

Theoretical questions

What are the assumptions of a linear model (or any other type of model)?
What’s the difference between K Nearest Neighbor and K-means Clustering?
How do you address overfitting?
Explain Naive Bayes algorithms
When do you use an AUC-ROC score? What kind of information can you gather from it?
What is cross validation?
What are confounding variables?
If an important metric for our company stopped appearing in our data source, how would you investigate the causes?

Practical exercise

In this challenge, you will showcase your knowledge in feature engineering, dimensionality reduction, model selection and evaluation, hyperparameter tuning, and any other techniques of machine learning.

There isn't a correct solution to this challenge. All we would like to learn is your thinking process that demonstrates your knowledge, experience, and creativity in developing machine learning models. Therefore, in addition to developing the model and optimizing its performance, you should also elaborate your thinking process and justify your decisions throughout the iterative problem-solving process.

The suggested time to spend on this challenge is 90-120 minutes. If you don't have time to finish all the tasks you plan to do, simply document the to-dos at the end of your response.

Instructions:

  • Download the housing prices data set (housing_prices.csv). The data is big enough to showcase your thoughts but not so that processing power will be a problem.
  • Using Python, analyze the features and determine which feature set to select for modeling.
  • Train and cross validate several regression models, attempting to accurately predict the SalePrice target variable.
  • Evaluate all models and show comparison of performance metrics.
  • State your thoughts on model performance, which model(s) you would select, and why.