/Data_Science_Machine_Learning_Curriculum

Curriculum, Training, Certification, Hiring Guide for Data Science Machine Learning

MIT LicenseMIT

Data_Science_Machine_Learning_Curriculum

Curriculum, Training, Certification, Hiring Guide for Data Science Machine Learning

Note: DS standardization effort https://www.iadss.org/educational-programs-map

Workflow

Data Science Workflow is an areas that can potentially bring together pure vs. applied, general vs. specialized, and interdisciplinary topics in a pragmatic project based context for getting the scope of how a Data Science curriculum should be balanced:

https://docs.google.com/document/d/1Ib_CNXrukZ29A5fVodpH2xaUfOmjjhzK6eesmxjuDsg/edit?usp=sharing

Books in Particular

Possible Minds: Twenty-Five Ways of Looking at AI (Topic: History & Future of AI) by John Brockman - editor, et al. https://www.amazon.com/Possible-Minds-audiobook/dp/B07MQX54TW/

Artificial Intelligence: A Guide for Thinking Humans (Topic: History & Future of AI) by Melanie Mitchell Pelican (October 15, 2019) https://www.amazon.com/Artificial-Intelligence-Guide-Thinking-Humans/dp/0241404827/

Deep Learning (Adaptive Computation and Machine Learning series) (Deep Learning) by Ian Goodfellow , Yoshua Bengio , et al. https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/

Rebooting AI (Topic: Comparing AI model performance) by Gary Marcus, Ernest Davis, et al. https://www.amazon.com/Rebooting-AI-Building-Artificial-Intelligence/dp/052556604X

An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics Book 103) (Standard Traditional Textbook for ~ non-deep-learning 'machine learning') by Gareth James , Daniela Witten , et al. | Jun 24, 2013 https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370

Chris Albon Machine Learning Flash Cards https://machinelearningflashcards.com/

Background Reading List:

https://docs.google.com/document/d/1dDF40M5JjjrBsYYQbJplz3M738ktQBBYyNa6FXhzNFU/edit?usp=sharing

Areas Overall

  • General Curriculum Guidelines and Standards
  • Curriculum Tools for Educators
  • Curriculum Tools for Students
  • Curriculum Tools for Employers
  • Curriculum Content Maps
  • Curriculum Teaching Method Standards
  • Certification
  • DS ML Etc Specialization Areas

Pre-requisite skills

  • github
  • html
  • linear algebra
  • markdown (e.g. text display in github)
  • python (python3)
  • functional programming
  • terminals (linux/posix/unix/MacOS)
  • "Notebooks" (Jupyter Notebooks, Colab Notebooks, For: python, scala)
  • text editors
  • Code Development Environments / Kits: IDE/IDK
  • Environment Management
  • Command line Process: Bash etc & Unix/Posix
  • Networks
  • Deployment
  • Dashboarding

General DS Curriculum Areas (Not Specialist Skills)

  • Hypothesis Testing
  • Math, Statistics/Econometrics, Probability, Information Theory
  • DS Etc. Workflow
  • Portfolio
  • Linear Models
  • Deep Learning
  • Practical Programming
  • Computer Science Principles
  • History of Computation
  • History of "Data Science" AI etc.
  • Application Frameworks (Six Sigma, Lean, Agile, SCRUM)
  • Interdisciplinary Studies: Biological Neurons, Neural Networks & Plasticity
  • Presentation and Blogging Skills

Specialization Areas: Data Science Disambiguation

https://towardsdatascience.com/why-you-shouldnt-be-a-data-science-generalist-f69ea37cdd2c

(Note: Generalization is still valued, especially in small startups and for Agile-using-generalists(which is the original Agile system))

Programming Languages and Data Science

  • R (academic)
  • Python (general)
  • Spark (distributed)
  • C (robotics)

Main Three Specialization Branches of DS (Data Science)

    1. Data Engineering / Big Data Pipeline Engineering
    1. Data Analysis / Data Analytics
    1. Machine Learning Engineering

Other Specializations

  • SQL & Databases
  • "Data Mining" (seems to be an older pre"DS" term)
  • Software Engineering
  • Linear Specific Machine Learning
  • Statistical Analysis & Hypothesis Testing
  • Neural Networks

Domain Knowledge / Domain Specialization Areas

  • Biology / Medical (Genetics)
  • Banking & Finance

DS/ML Sub-Specialty/Focus Areas

Specialized Skills & Tools

  • SQL
  • various quasi-SQL (like HiveQL)
  • Various No-SQL
  • data engineering vs. analytics vs. AI models
  • Spark

General Skills

  • Project Management
  • Meetings
  • Presentations
  • Reports
  • Emails
  • Office Suites
  • Databases

History & Diversity of Data Science & AI

  • Cellular Automata
  • Genetic Algorithms
  • Expert Systems
  • Decision Trees
  • Ensemble Models

History Background Concepts

  • "Subsymbolic" AI

Terms and Disambiguation:

AI, Deep Learning, Machine Learning, Statistical Learning, Business-Intelligence, Data Mining, Statistics, Data Analysis, Hypothesis or A/B Testing, Perceptrons, Neurons, Neural Networks, Hidden Layers

Cross Validation

Deep Learning

Artificial Neural Networks

  • History
  • Types
  • Ensembles
  • Hyperparameters
  • Activation Functions

Topics

  • Analogy
  • Geofencing

AI Tests

  • Turing Tests
  • SQuAD

Parametric & GLM vs. Nonparametric

  • Parameters & Coefficients in Parametric Models
  • Logistic Regression
  • Linear Regression

Linear Regression

  • Sum of Squared Residuals
  • R^2
  • P

Object Relationships

NLP

https://docs.google.com/document/d/19v8jMx60QTWfyRkp6VThJeiSfXNWqadB50FQCzaAOVM/edit?usp=sharing

  • Box Of Words

Data Science Concepts

  • Baseline
  • Dimensionality (e.g. the curse of dimensionality)

Scores and Baselines

  • The Confusion Matrix

Tools and Platforms