Materials about relevant topics in Software Architecture, specially Object Oriented Programming, as part of the learning contents for the students of Data Science Retreat, Berlin. These contents are prepared for being taught in 2 consecutive sessions of 8 hours each.
It is assumed that the students have at the beginning of this sessions a decent Python level, and that they are familiar with the basics of Machine learning, including data acquisition and preparation, feature extraction, modelling (including hyperparameter tuning) and model evaluation.
A lot of data scientists don't come from a Computer Science background, and even though they learn fast how to code and use properly the different packages and frameworks for cleaning and preparing the data and modelling, they often lack of a wider vision and tend to work or code in a way that is not only hard to maintain (spaguetti code) but also hard to deploy or scale up.
In the two days assigned to this part, we will discuss how to improve and reinforce that area in your learning path by giving you a basic understanding of the parts of software architecture that are going to impact your way of doing Data Science (and thus, your team work skills, or your success chances when applying for jobs in the field, to name just a couple of examples).
We are not going to be able to cover in just two days all the topics in an exhaustive manner and to claim otherwise would be an insult to all Software Architects and Computer Science professionals who have been devoting years of their lives in mastering these skills. So our goal is not to teach you everything, but to teach you the basics so you can start working on your own, and at the same time, make you aware of these topics so you can decide how important they are for you, and keep researching and improving them at your own pace.
- Introduction
- Hands-On: Analyzing a Data Science Project
- Overview: What we will cover in these sessions
- Part 2: Notions of Software architecture for Data Science
- Project Structure:
- Logging:
- Handling Exceptions:
- Small introduction to Tests
- Code versioning
- Data versioning
- Experiment tracking