/Math-for-ML-AI

The official markdown repository containing the math for ml/ai roadmap

MIT LicenseMIT

Math for ML/AI roadmap

Prerequisites

  • Precalculus
  • Algebra I

Math for ML

  • Calculus
    • Calculus I - Calculus I forms the foundation for understanding changes and motion in machine learning models. It introduces key concepts like limits, which help understand how functions behave as they approach specific points, and derivatives, which are essential for understanding rates of change. In machine learning, derivatives play a crucial role in optimization algorithms, like gradient descent, used for training models. This part of calculus also covers the basics of integration, providing a way to aggregate or sum up quantities, which is useful in areas like probability and data analysis. Understanding the fundamentals of Calculus I is vital for grasping more complex concepts in machine learning and for effectively implementing and improving ML algorithms.
      • Calculus I khan academy
      • calculus I MIT course
      • The best calculus intuition on the internet
      • derivatives intuition -Derivatives are fundamental in machine learning for optimizing algorithms. They indicate how a function's output changes in response to variations in input, crucial for algorithms like gradient descent. By calculating derivatives, we can minimize a loss function, adjusting model parameters for better data fit. This process, especially in neural networks, involves backpropagation, which relies heavily on derivatives. Understanding derivatives is key for effective algorithm implementation and improvement in machine learning.
      • understanding what integrals are - Integrals are used in machine learning to aggregate or accumulate quantities, offering a way to sum up continuous data points or functions. This concept is important for calculating areas under curves, which in ML can relate to probabilities and distributions in statistics. Integrals also play a role in understanding the cumulative impact of small changes, crucial in algorithms that deal with continuous data. Mastery of integrals aids in grasping complex concepts in probability and data analysis within the machine learning field
      • Derivatives
        • Why derivatives in machine learning
          • gradient descent - Derivatives are pivotal in gradient descent algorithms, which find the minimum of a function (often a loss function in ML models). By calculating the derivative, the algorithm determines the direction to adjust parameters to reduce the error.
          • Best line fit - In regression models, derivatives help in finding the best line that fits the given data. This involves minimizing the difference (error) between the predicted values and the actual values.
        • Real life applications of derivatives
        • derivatives rules - There are certain basic rules about derivatives that you have to know. They result from the definition of a derivative. For example the derivative of a constant is 0.
        • differentiation rules - There are certain rules you have to know when deriving functions. For example how you derive a product, division or addition of functions
        • chain rule - A technique for finding the derivative of composite functions. It is used in machine learning to calculate the derivative of a function composed of several functions, being critical to neural networks and backpropagation.
        • Approximating with local linearity - Used in machine learning to approximate a function with a linear function, which is easier to work with. This is done by finding the tangent line to a point on the function, which is the derivative at that point. So in simpler words, approximating a function near a point
        • L'hopital rule - A method for evaluating limits of indeterminate forms, which are expressions that cannot be evaluated by substituting the limit value. It is used in machine learning to find the limit of a function, which is useful in optimization algorithms like gradient descent.
        • Relative (local) extrema - A point on a function where the function's value is either greater or less than the values of the surrounding points. It is used in machine learning to find the minimum or maximum of a function, which is important in optimization algorithms like gradient descent.
        • Concavity - A measure of the curvature of a function, which is used in machine learning to find the inflection points of a function. These points are important in optimization algorithms like gradient descent.
        • Optimization problems derivatives - Optimization problems are used in machine learning to find the minimum or maximum of a function, which is important in optimization algorithms like gradient descent.
      • Implicit differentiation -Implicit differentiation is a method used when variables in a function are intermixed and cannot be separated easily. It involves differentiating both sides of an equation with respect to an independent variable. This technique is useful in machine learning for handling complex relationships between variables, especially when standard differentiation rules are not directly applicable. It helps in understanding the effect of changes in one variable on another.
      • Integrals - Integrals in mathematics, particularly in the context of machine learning, are used for summing or accumulating quantities over an interval. They are essential for calculating areas under curves, which can represent probabilities and distributions in statistics, a key part of data analysis in ML. Integrals also aid in understanding the cumulative effects of continuous changes, crucial for handling continuous data and for algorithms involving probabilities and continuous optimization. Understanding integrals is vital for effective data analysis and model formulation in machine learning.
      • Differential equations basics -Differential equations are equations that involve an unknown function and its derivatives. In the basics of differential equations, you learn how to solve equations that describe the rate of change of a quantity. These equations are fundamental in modeling various phenomena where the rate of change of one variable is directly dependent on other variables
    • Calculus II -Calculus II builds upon the foundations of Calculus I, focusing on advanced integration techniques, series, and sequences. It includes topics like integration by parts, partial fractions, and convergence of series. In machine learning, these concepts are essential for complex models, especially in probabilistic methods and continuous optimization. This knowledge is crucial for understanding and implementing more sophisticated ML algorithms.
    • Series -Series in mathematics refer to the sum of a sequence of numbers. In machine learning, series are important for understanding algorithms that involve summation over time or across data points, such as in time-series analysis or when dealing with sequential data. Grasping the concept of series, including convergence and divergence, is key for implementing and analyzing algorithms that aggregate information over sequences.
    • Calculus III ( multivariable calculus ) - Multivariable calculus extends the concepts of single-variable calculus to functions of several variables. It includes topics like partial derivatives, multiple integrals, and vector calculus. In machine learning, this area is crucial for dealing with high-dimensional data and for understanding how changes in multiple inputs simultaneously affect an output. Mastery of multivariable calculus is essential for advanced machine learning techniques such as optimization in high-dimensional spaces and modeling complex, multi-faceted systems.
      • Khan academy modules
      • Visualizing 3d graphs
      • Parametric curves
      • Parametric surfaces
      • Vector fields - Vector fields represent the distribution of vectors in a multi-dimensional space, where each vector indicates a direction and magnitude at a point in that space. In machine learning, vector fields can be used to visualize and understand the behavior of gradients in optimization problems, like visualizing the direction of steepest descent in a loss landscape. They are also important in understanding dynamical systems and fluid dynamics models that are applicable in certain AI applications.
      • Transformations
      • Partial derivatives - Partial derivatives are used in multivariable calculus to measure how a function changes as one of its variables changes, holding the others constant. In machine learning, partial derivatives are crucial for gradient calculations in multivariate optimization problems, such as those encountered in training neural networks. They help in understanding the sensitivity of a function's output to changes in each input variable, which is key for fine-tuning model parameters.
      • Divergence and Curl -Divergence and curl are concepts from vector calculus dealing with vector fields. Divergence measures the magnitude of a field's source or sink at a given point, essentially quantifying how much a vector field spreads out or converges. Curl, on the other hand, measures the rotation or swirling strength of a field around a point. In machine learning, understanding divergence and curl can be important for advanced algorithms, especially those involving fluid dynamics, electromagnetism, or other applications where vector fields and their behaviors are relevant. These concepts help in analyzing the properties and behaviors of complex systems modeled in AI and ML.
      • Laplacian
      • Minima and maxima in multivariable functions -Minima and maxima in multivariable functions refer to points where a function reaches its lowest or highest value, respectively, in a given neighborhood. Finding these points is crucial in optimization problems such as minimizing a loss function in model training or maximizing performance metrics.
      • Line integrals -Line integrals involve integrating a function along a curve, calculating the sum of values over a path rather than an area. They are useful for measuring the total effect along a trajectory, such as total work done in a force field. In applied mathematics and related fields, line integrals can be used to solve problems involving cumulative effects along a path, which could be relevant in various data analysis and modeling scenarios.
      • Double and triple integrals -Double integrals extend the concept of integration to two-dimensional spaces, allowing for the calculation of volume under a surface or the accumulation of quantities over an area. Triple integrals further extend this to three-dimensional spaces, enabling the computation of quantities within a volume. These integrals are important in fields requiring volumetric analysis and in problems involving spatial dimensions. Understanding these integrals is essential for solving complex problems where integration over multiple dimensions is required.
  • How neural networks work - At this point you should be able to understand all the internal workings of a basic neural network. Here I left some videos by 3blue1brown explaining exactly that
  • Probabilities - Probabilities measure the likelihood of an event occurring and are a fundamental concept in statistics and data analysis. They are essential for interpreting and predicting data patterns, evaluating risks, and making decisions under uncertainty. In data-driven fields, understanding probabilities enables the analysis of outcomes, the assessment of models, and the quantification of uncertainty in predictions and analyses.
  • Statistics
    • Statistics playlist
    • Probability and statistics course
    • Median, mode and mean(average) - These are the 3 basic ways to calculate the properties of a set of numbers. The mean is the most common one, but it is not always the best one. The median is the middle number of a set of numbers. The mode is the number that appears the most in a set of numbers.
    • Population and sample mean - The population mean is the average of all values in a population, while the sample mean is the average of values in a sample from that population. The sample mean is used as an estimate of the population mean
    • Variance -Variance measures the spread of data points around the mean, indicating how much the data varies. It's calculated as the average of the squared differences from the mean.
    • Standard deviation - Standard deviation is the square root of variance and provides a measure of the spread of data points around the mean in the same units as the data.
    • Normal distribution - A normal distribution, also known as a Gaussian distribution, is a symmetric, bell-shaped distribution where most observations cluster around the central peak and the probabilities for values further away from the mean taper off equally in both directions.
    • Central limit theorem -This theorem states that the distribution of sample means approximates a normal distribution as the sample size becomes large, regardless of the shape of the population distribution.
    • Standard error of the mean -The standard error of the mean measures the variability or standard deviation of the sample mean estimate of a population mean. It decreases as the sample size increases.
    • Bayesian statistics -Bayesian statistics involves using Bayes' theorem to update the probability of a hypothesis as more evidence or information becomes available.
      • Bayesian statistics Course
      • Bayes theorem - Bayes' Theorem is a fundamental concept in probability theory. It describes the probability of an event, based on prior knowledge of conditions that might be related to the event. Formally, it calculates the conditional probability of an event A given another event B, based on the inverse probability of B given A and the individual probabilities of A and B. This theorem is widely used in various fields for making inferences and updating probabilities as new evidence is obtained. It's a cornerstone in Bayesian statistics, allowing for a mathematical framework to update beliefs based on new data.
      • Bayesian trap
    • Margin of error and confidence intervals
    • Hypothesis testing and p-values - In hypothesis testing, the p-value is a measure used to assess the strength of the evidence against the null hypothesis. It represents the probability of observing a test statistic as extreme as, or more extreme than, the observed value under the assumption that the null hypothesis is true. A low p-value (typically less than 0.05) indicates that the observed data are unlikely under the null hypothesis and leads to the rejection of the null hypothesis, suggesting that the alternative hypothesis may be true. The p-value is a crucial tool in statistical significance testing, helping to determine whether the results of a study or experiment are likely due to chance or to some actual effect.
    • One-tailed and two-tailed tests
    • Regression - Regression is a statistical method used to examine the relationship between two or more variables. Essentially, it helps understand how the typical value of the dependent variable changes when any one of the independent variables is varied. In its simplest form, linear regression, it's like fitting a straight line through data points in a way that best expresses the relationship between those points. It's widely used for prediction and forecasting, where it helps in determining how changes in one variable can affect another.
    • z-score - A z-score is a measure of how many standard deviations below or above the population mean a raw score is. It's calculated as the difference between a raw score and the population mean, divided by the population standard deviation.
    • Correlation
    • Covariance
    • Sampling methods
  • Linear algebra -Linear algebra is a branch of mathematics that deals with vectors, vector spaces, and linear mappings between these spaces. It includes the study of lines, planes, and subspaces, and is fundamental for many areas of mathematics. In linear algebra, the concepts of vector operations, matrices, determinants, eigenvalues, and eigenvectors are key. These concepts are essential for solving systems of linear equations, which is a common problem in various applied fields including computer science, engineering, physics, and economics. Linear algebra provides the language and framework for many modern areas of applied mathematics, including machine learning and data analysis, where it's used to model and solve complex problems.
  • Optimization theory course - Optimization theory involves finding the best solution from all feasible solutions for a given problem. It's used in various fields like economics, engineering, and computer science to maximize efficiency and minimize costs in different scenarios.
  • information theory course - Information theory deals with the quantification, storage, and communication of information. It's foundational in areas like data compression, cryptography, and telecommunications, focusing on signal processing and data transmission efficiency.
  • econometrics course -Econometrics applies statistical and mathematical methods to economic data to test hypotheses and forecast future trends. It's essential in economics and finance for analyzing economic relationships and informing economic policy decisions.