/data-science-interview-questions

questions I have been asked during interviews

Apache License 2.0Apache-2.0

data-science-interview-questions-and-tasks

questions I have been asked during interviews

questions

  • You have to train a 10 gb model on a 8gb RAM machine - Imagine you have a neural net and a SVM, which different techniques would you use for batching?

  • How would you approach cross-validation for time series data?

  • Can you use k-fold cross validation for time series data?

  • How does a recurrent neural net work?

  • Why is lightGBM only used with more than 10,000 data points?

  • What steps should you take to make a project to predict the revenue of movies (second task below)?

  • If your business requires your model to never allow false-positives, what should you do?

  • How to create a machine learning model to make a calculator? The calculator makes the sum operation between two features and predicts the result.

  • What happens if your machine learning calculator model requires only integers, but integers with more than 100.000 digits? If it is a problem, how to overcome it? Would seq2seq help? How?

task

  • Please implement GA/GTM tracking/analytics for a given frontend as well as a data warehouse in the cloud which collects the data. you have one week

  • Create a project to predict the revenue of movies. It's a binary class problem (high or low revenue) and you should take into account two subtasks: list the 20 most important features for your model and create a file with the predictions of a given test dataset. (4 hours duration)