Get started with ML Architecture at Scale by familiarizing yourself with some foundational concepts.
NOTE: Do not hesitate to change playback speed on youtube to suit your needs.
-
The Python Program Incantation: "if name main" (MCoding) (VIDEO)
-
Classes: The Difference between
__init__
and__new__
(MCoding) (VIDEO) -
Optimizations: Cache and Branch-Predictability (MCoding) (VIDEO)
-
Classes: Factory Pattern, Plugin Architecture (ArjanCodes) (VIDEO)
-
Optimizations: Removing "code smell" / bad practices (ArjanCodes) (VIDEO)
-
Optimizations: Speed up slow code:
async
/await
(MCoding) (VIDEO)
- mention a "core set" (
cat
,find
,sed
,grep
,awk
,tee
) - TODO: compile a couple "awesome lists"
- To avoid freezing when RAM is low, see Adding swap space
- not going to talk much about this ... just some things to keep in mind
- be prepared to work and deploy things on various operating systems
- Debian-based distributions like Ubuntu are very commonly chosen for deploying microservices
- some are lightweight and security focused (e.g. Alpine), others meant for enterprise-level stability (e.g. CentOS)
- using containers for development can allow you to match the runtime environment of your production code
- the more comfortable you are working across different operating systems, the easier time you will have when thrown into unfamiliar environments
Taking your prototypes to production.
TODO: fill in some helpful links on topics below
- postgres, mariadb, mysql, find something that discusses all these and link to it here.
- basic principles of querying
- star schema, normalization
- proprietary (snowflake, dynamo)
-
object storage (s3)
-
names of technologies for this (mongo)
-
hot storage
- expensive storage, cheap access
- AWS, GCP, what are they called in these platforms?
- 10 Things You Might Not Know About S3 (ARTICLE)
-
cold storage
- cheaper storage, more expensive access
-
warm storage
- middle ground between hot and cold
-
provisioned vs elastic storage
- filesystem and OS has to live somewhere, usually SSD hardware, fast IO, no bandwidth required internally to access storage. Very expensive to scale this. usually provisioned as part of the class of compute, you get what you get at that price point and your app better deal with it.
- can mount other storage types to your compute instance with separate pricing model and features like snapshotting, pre-allocated storage that you can scale as you need to (e.g. EBS)
- (unadvised) can mount s3 storage as a filesystem
- there may exist options that only make you pay for what you use, but at a higher rate (e.g. EFS on AWS, an elastic file storage service)
- GCP: "Cloud Run"
- AWS: "Elastic Beanstalk"
- Azure:
- AWS: EC2
- GCP: Compute Engine
- Azure:
- cross platform: Terraform
- AWS: Cloud Formation
- GCP: Cloud Build IaC
- Azure: Azure Resource Manager IaC
Martin Fowler's Article on Continuous Delivery for ML
- pytest, unit test
- Types of Testing (Atlassian) (ARTICLE)