/quickstart

Get started with ML Architecture at Scale by familiarizing yourself with some foundational concepts.

quickstart

Get started with ML Architecture at Scale by familiarizing yourself with some foundational concepts.


Technical Background


Software Development

git

python

NOTE: Do not hesitate to change playback speed on youtube to suit your needs.

GNU/Linux Tooling

  • mention a "core set" (cat, find, sed, grep, awk, tee)
  • TODO: compile a couple "awesome lists"
  • To avoid freezing when RAM is low, see Adding swap space

Operating Systems

  • not going to talk much about this ... just some things to keep in mind
  • be prepared to work and deploy things on various operating systems
  • Debian-based distributions like Ubuntu are very commonly chosen for deploying microservices
  • some are lightweight and security focused (e.g. Alpine), others meant for enterprise-level stability (e.g. CentOS)
  • using containers for development can allow you to match the runtime environment of your production code
  • the more comfortable you are working across different operating systems, the easier time you will have when thrown into unfamiliar environments

Operationalization

Taking your prototypes to production.

container technologies

docker, containerd, podman

kubernetes

data access

TODO: fill in some helpful links on topics below

relational

  • postgres, mariadb, mysql, find something that discusses all these and link to it here.
  • basic principles of querying
  • star schema, normalization
  • proprietary (snowflake, dynamo)

non-relational

  • object storage (s3)

  • names of technologies for this (mongo)

  • hot storage

  • cold storage

    • cheaper storage, more expensive access
  • warm storage

    • middle ground between hot and cold
  • provisioned vs elastic storage

    • filesystem and OS has to live somewhere, usually SSD hardware, fast IO, no bandwidth required internally to access storage. Very expensive to scale this. usually provisioned as part of the class of compute, you get what you get at that price point and your app better deal with it.
    • can mount other storage types to your compute instance with separate pricing model and features like snapshotting, pre-allocated storage that you can scale as you need to (e.g. EBS)
    • (unadvised) can mount s3 storage as a filesystem
    • there may exist options that only make you pay for what you use, but at a higher rate (e.g. EFS on AWS, an elastic file storage service)

microservices architecture

cloud providors

auto-scaling

  • GCP: "Cloud Run"
  • AWS: "Elastic Beanstalk"
  • Azure:

VM instances

  • AWS: EC2
  • GCP: Compute Engine
  • Azure:

infrastructure as code

  • cross platform: Terraform
  • AWS: Cloud Formation
  • GCP: Cloud Build IaC
  • Azure: Azure Resource Manager IaC

ml lifecycle

Martin Fowler's Article on Continuous Delivery for ML

training

tracking

testing