IMPORTANT NOTE: This repository is about the old version of the DP-100 certification. Today's certification focus is much more on Azure Machine Learning Service instead of Studio, which is disappearing with time.

Azure Data Scientist

This repository contains my notes for preparing the exam Microsoft DP-100: Designing and Implementing a Data Science Solution on Azure.

The exam consists on 35 questions, 5 of which represent a use case. The length of the exam is 3 hours, which is largely enough.

Most of the questions of the exam are about Machine Learning Studio and its modules. The other questions are related to Azure services (such as Databricks, Data Factory, Virtual Machines, Cognitive Services etc) and some Machine Learning and Python-related questions.

Here is an overview of the study material I used and recommend.

Machine Learning Studio

Almost every question regarding ML Studio is about its modules. I suggest going through the list of modules, but not yet because there is a plethora of them and one can easily get lost. I recommend using this list as a reference when you have some doubts about a specific module.

First thing to do is to explore ML Studio and some of its modules. I thereby recommend following the Microsoft Learn Learning Path Publish a Machine Learning Experiment with Azure Machine Learning Studio, which provides an overview of how to access to ML Studio and clearly explains how to do your first Machine Learning experiment.

Once you understand how ML Studio works, one can start exploring other experiments on ML Studio and/or start doing its own experiment. It is recommended to start getting familiarized with the main modules. Here are some experiments you could follow:

Now it is time to discover other modules and try to implement them. Note that some modules are more important for this exam than others. In the file Machine Learning Studio/Relevant_modules.md, you will find a list of modules you must be familiar with, as there is a high probability they will be asked in the exam.

For the DP-100 exam, you will sometimes need to just know the module but in some cases, you might need to know how to use it and how to specify its parameters.

Optimally, one should go through the list of modules and understand every module's purpose, but you do not need all of them for the exam. A good advice is to do dump exams and understand / experiment with the modules you will find.

Azure Services

That was the trickiest part for me. Questions regarding Azure Services are generally basic but you need a good understanding on a lot of different Azure sevices. Got questions on Azure Databricks (and Data Factory orchestration), DSVM and DLVM, Azure Notebooks, Azure HDInsight etc. The tricky thing is that questions range from "what should you use if you want to run a python notebook without any installation nor Azure subscription" (Azure Notebooks) to "which virtual machine do you need if you want to connect to a PostGRE SQL database" (DSVM with Linux) and other very specific questions. Again, a good advice is to look at the dumps and understand / experiment with the different services.

Here are a non-exhaustie list of training recommendations:

Machine Learning and Python

There are as well some questions about Machine Learning (PCA, Cross-validation, performance metrics, class imbalance, model selection etc) but they are in general not very hard. If you followed the steps described in the previous two sections, you would have learned almost everything you need for this exam.

In addition, some questions about Python and scikit-learn might also arise (around 2 or 3 questions out of 35). They are not very hard but again, I refer the reader to the dumps for Python-related questions. For example, I remember a question about which values are expected as arguments of the scikit-learn PCA function.

Finally, there are some questions about PyTorch, TensorFlow and other frameworks. Some potential questions are already covered in the previous two sections and the rest, I refer again the reader to dump exams.

Conclusion

If you follow the steps described previously, you should be able to answer at least 75% of the questions correctly. I strongly encourage to do some dump exams before taking the exam, as they are very representative of the exam.

Finally, here are two nice articles articles about the DP-100 exam:

Thanks for reading and gluck !