Prerequisites
- Precalculus
- Algebra I
Math for ML
- Calculus
- Calculus I - Calculus I forms the foundation for understanding changes and motion in machine learning models. It introduces key concepts like limits, which help understand how functions behave as they approach specific points, and derivatives, which are essential for understanding rates of change. In machine learning, derivatives play a crucial role in optimization algorithms, like gradient descent, used for training models. This part of calculus also covers the basics of integration, providing a way to aggregate or sum up quantities, which is useful in areas like probability and data analysis. Understanding the fundamentals of Calculus I is vital for grasping more complex concepts in machine learning and for effectively implementing and improving ML algorithms.
- Calculus I khan academy
- calculus I MIT course
- The best calculus intuition on the internet
- derivatives intuition -Derivatives are fundamental in machine learning for optimizing algorithms. They indicate how a function's output changes in response to variations in input, crucial for algorithms like gradient descent. By calculating derivatives, we can minimize a loss function, adjusting model parameters for better data fit. This process, especially in neural networks, involves backpropagation, which relies heavily on derivatives. Understanding derivatives is key for effective algorithm implementation and improvement in machine learning.
- understanding what integrals are - Integrals are used in machine learning to aggregate or accumulate quantities, offering a way to sum up continuous data points or functions. This concept is important for calculating areas under curves, which in ML can relate to probabilities and distributions in statistics. Integrals also play a role in understanding the cumulative impact of small changes, crucial in algorithms that deal with continuous data. Mastery of integrals aids in grasping complex concepts in probability and data analysis within the machine learning field
- Derivatives
- Why derivatives in machine learning
- gradient descent - Derivatives are pivotal in gradient descent algorithms, which find the minimum of a function (often a loss function in ML models). By calculating the derivative, the algorithm determines the direction to adjust parameters to reduce the error.
- Best line fit - In regression models, derivatives help in finding the best line that fits the given data. This involves minimizing the difference (error) between the predicted values and the actual values.
- Real life applications of derivatives
- derivatives rules - There are certain basic rules about derivatives that you have to know. They result from the definition of a derivative. For example the derivative of a constant is 0.
- differentiation rules - There are certain rules you have to know when deriving functions. For example how you derive a product, division or addition of functions
- chain rule - A technique for finding the derivative of composite functions. It is used in machine learning to calculate the derivative of a function composed of several functions, being critical to neural networks and backpropagation.
- Approximating with local linearity - Used in machine learning to approximate a function with a linear function, which is easier to work with. This is done by finding the tangent line to a point on the function, which is the derivative at that point. So in simpler words, approximating a function near a point
- L'hopital rule - A method for evaluating limits of indeterminate forms, which are expressions that cannot be evaluated by substituting the limit value. It is used in machine learning to find the limit of a function, which is useful in optimization algorithms like gradient descent.
- Relative (local) extrema - A point on a function where the function's value is either greater or less than the values of the surrounding points. It is used in machine learning to find the minimum or maximum of a function, which is important in optimization algorithms like gradient descent.
- Concavity - A measure of the curvature of a function, which is used in machine learning to find the inflection points of a function. These points are important in optimization algorithms like gradient descent.
- Optimization problems derivatives - Optimization problems are used in machine learning to find the minimum or maximum of a function, which is important in optimization algorithms like gradient descent.
- Why derivatives in machine learning
- Implicit differentiation -Implicit differentiation is a method used when variables in a function are intermixed and cannot be separated easily. It involves differentiating both sides of an equation with respect to an independent variable. This technique is useful in machine learning for handling complex relationships between variables, especially when standard differentiation rules are not directly applicable. It helps in understanding the effect of changes in one variable on another.
- Integrals - Integrals in mathematics, particularly in the context of machine learning, are used for summing or accumulating quantities over an interval. They are essential for calculating areas under curves, which can represent probabilities and distributions in statistics, a key part of data analysis in ML. Integrals also aid in understanding the cumulative effects of continuous changes, crucial for handling continuous data and for algorithms involving probabilities and continuous optimization. Understanding integrals is vital for effective data analysis and model formulation in machine learning.
- Differential equations basics -Differential equations are equations that involve an unknown function and its derivatives. In the basics of differential equations, you learn how to solve equations that describe the rate of change of a quantity. These equations are fundamental in modeling various phenomena where the rate of change of one variable is directly dependent on other variables
- Calculus II -Calculus II builds upon the foundations of Calculus I, focusing on advanced integration techniques, series, and sequences. It includes topics like integration by parts, partial fractions, and convergence of series. In machine learning, these concepts are essential for complex models, especially in probabilistic methods and continuous optimization. This knowledge is crucial for understanding and implementing more sophisticated ML algorithms.
- Calculus II khan academy
- Calculus II playlist
- Integration techniques - Integration techniques involve various methods to solve complex integrals. Key techniques include integration by parts, substitution, and partial fractions. These methods are useful in machine learning for solving problems related to areas under curves, probabilities, and in algorithms that require integration of functions. Understanding these techniques is important for dealing with continuous data and for certain optimization problems in ML.
- Integration techniques playlist
- Khan academy integration techniques
- u substitution - A method for solving integrals by substituting a function with a variable u. It is used in machine learning to solve complex integrals, which are important in algorithms that deal with continuous data.
- trigonometric identities integration
- alternative khan academy
- trigonometric substitution
- integration by parts
- improper integrals
- Differential equations - Differential equations involve relationships between functions and their derivatives, expressing how a quantity changes over time or space. They are crucial in modeling dynamic systems, where the rate of change is key. In machine learning, differential equations are used for time-series analysis, dynamic modeling, and in advanced neural network architectures. Understanding these equations is essential for applying ML to real-world problems that involve temporal or spatial dynamics.
- Applications of integrals
- More on equations and coordinates
- Polar coordinates -Polar coordinates represent points in a plane using a radius and an angle, offering an alternative to the traditional Cartesian coordinate system. In machine learning, polar coordinates can be useful for data representation and feature engineering, particularly in problems where the data's orientation and distance from a central point are important. Understanding polar coordinates can aid in visualizing and analyzing data that naturally fits into circular or spiral patterns.
- Parametric equations
- Calculating arc length
- Vector valued functions
- Planar motion problems
- Series -Series in mathematics refer to the sum of a sequence of numbers. In machine learning, series are important for understanding algorithms that involve summation over time or across data points, such as in time-series analysis or when dealing with sequential data. Grasping the concept of series, including convergence and divergence, is key for implementing and analyzing algorithms that aggregate information over sequences.
- Khan academy module
- Convergent and divergent infinite series -
- Infinite geometric series - Convergent and divergent infinite series refer to the behavior of series as the number of terms approaches infinity. A convergent series approaches a finite limit, while a divergent series does not settle to a finite value. In machine learning, understanding the nature of these series is important for algorithms that involve iterative processes or summations over large datasets. Knowing whether a series converges or diverges can impact the stability and efficiency of these algorithms.
- n-th term test
- integral test
- P series
- Direct comparison test
- Limit comparison test
- Ratio test
- Taylor and Maclaurin series
- Lagrange error bound
- Power series
- Calculus III ( multivariable calculus ) - Multivariable calculus extends the concepts of single-variable calculus to functions of several variables. It includes topics like partial derivatives, multiple integrals, and vector calculus. In machine learning, this area is crucial for dealing with high-dimensional data and for understanding how changes in multiple inputs simultaneously affect an output. Mastery of multivariable calculus is essential for advanced machine learning techniques such as optimization in high-dimensional spaces and modeling complex, multi-faceted systems.
- Khan academy modules
- Visualizing 3d graphs
- Parametric curves
- Parametric surfaces
- Vector fields - Vector fields represent the distribution of vectors in a multi-dimensional space, where each vector indicates a direction and magnitude at a point in that space. In machine learning, vector fields can be used to visualize and understand the behavior of gradients in optimization problems, like visualizing the direction of steepest descent in a loss landscape. They are also important in understanding dynamical systems and fluid dynamics models that are applicable in certain AI applications.
- Transformations
- Partial derivatives - Partial derivatives are used in multivariable calculus to measure how a function changes as one of its variables changes, holding the others constant. In machine learning, partial derivatives are crucial for gradient calculations in multivariate optimization problems, such as those encountered in training neural networks. They help in understanding the sensitivity of a function's output to changes in each input variable, which is key for fine-tuning model parameters.
- Divergence and Curl -Divergence and curl are concepts from vector calculus dealing with vector fields. Divergence measures the magnitude of a field's source or sink at a given point, essentially quantifying how much a vector field spreads out or converges. Curl, on the other hand, measures the rotation or swirling strength of a field around a point. In machine learning, understanding divergence and curl can be important for advanced algorithms, especially those involving fluid dynamics, electromagnetism, or other applications where vector fields and their behaviors are relevant. These concepts help in analyzing the properties and behaviors of complex systems modeled in AI and ML.
- Laplacian
- Minima and maxima in multivariable functions -Minima and maxima in multivariable functions refer to points where a function reaches its lowest or highest value, respectively, in a given neighborhood. Finding these points is crucial in optimization problems such as minimizing a loss function in model training or maximizing performance metrics.
- Line integrals -Line integrals involve integrating a function along a curve, calculating the sum of values over a path rather than an area. They are useful for measuring the total effect along a trajectory, such as total work done in a force field. In applied mathematics and related fields, line integrals can be used to solve problems involving cumulative effects along a path, which could be relevant in various data analysis and modeling scenarios.
- Double and triple integrals -Double integrals extend the concept of integration to two-dimensional spaces, allowing for the calculation of volume under a surface or the accumulation of quantities over an area. Triple integrals further extend this to three-dimensional spaces, enabling the computation of quantities within a volume. These integrals are important in fields requiring volumetric analysis and in problems involving spatial dimensions. Understanding these integrals is essential for solving complex problems where integration over multiple dimensions is required.
- Calculus I - Calculus I forms the foundation for understanding changes and motion in machine learning models. It introduces key concepts like limits, which help understand how functions behave as they approach specific points, and derivatives, which are essential for understanding rates of change. In machine learning, derivatives play a crucial role in optimization algorithms, like gradient descent, used for training models. This part of calculus also covers the basics of integration, providing a way to aggregate or sum up quantities, which is useful in areas like probability and data analysis. Understanding the fundamentals of Calculus I is vital for grasping more complex concepts in machine learning and for effectively implementing and improving ML algorithms.
- How neural networks work - At this point you should be able to understand all the internal workings of a basic neural network. Here I left some videos by 3blue1brown explaining exactly that
- Probabilities - Probabilities measure the likelihood of an event occurring and are a fundamental concept in statistics and data analysis. They are essential for interpreting and predicting data patterns, evaluating risks, and making decisions under uncertainty. In data-driven fields, understanding probabilities enables the analysis of outcomes, the assessment of models, and the quantification of uncertainty in predictions and analyses.
- Khan academy course
- Probability and statistics course
- Probability basics explained
- More on probabilities
- Binomial coefficients in probability - Binomial coefficients, represented as "n choose k", are used in probability to calculate the number of ways to choose k successes out of n trials. They are central to the binomial probability distribution, which models the number of successes in a fixed number of independent Bernoulli trials. Understanding binomial coefficients is crucial for calculating probabilities in scenarios where there are two possible outcomes (like success or failure) in each trial and where each outcome has a fixed probability.
- Dependent and independent events -Dependent probability refers to scenarios where the outcome of one event affects the outcome of another. In these cases, the probability of one event depends on the occurrence of the previous event. This concept is crucial for understanding complex probability scenarios, where events are not independent and require conditional probability calculations. It's key in analyzing situations where events or processes are interlinked, influencing each other's outcomes.
- Permutations and Combinations - Permutations and combinations are two fundamental concepts in probability and combinatorics, dealing with the arrangement of items.
- Conditional probability - Conditional probability measures the likelihood of an event occurring, given that another event has already occurred. It's represented as P(A|B), the probability of event A occurring given that B has happened. This concept is essential in situations where the occurrence of one event affects the likelihood of another. For example, if you have a deck of cards and you know that a card drawn is red, the probability of it being a heart is different (and higher) than if you had no information about the card's color. Understanding conditional probability is key for analyzing scenarios where events are interrelated.
- Birthday problem
- Random variables
- Binomial distribution
- Expected value
- Types of distributions
- Law of large numbers
- Statistics
- Statistics playlist
- Probability and statistics course
- Median, mode and mean(average) - These are the 3 basic ways to calculate the properties of a set of numbers. The mean is the most common one, but it is not always the best one. The median is the middle number of a set of numbers. The mode is the number that appears the most in a set of numbers.
- Population and sample mean - The population mean is the average of all values in a population, while the sample mean is the average of values in a sample from that population. The sample mean is used as an estimate of the population mean
- Variance -Variance measures the spread of data points around the mean, indicating how much the data varies. It's calculated as the average of the squared differences from the mean.
- Standard deviation - Standard deviation is the square root of variance and provides a measure of the spread of data points around the mean in the same units as the data.
- Normal distribution - A normal distribution, also known as a Gaussian distribution, is a symmetric, bell-shaped distribution where most observations cluster around the central peak and the probabilities for values further away from the mean taper off equally in both directions.
- Central limit theorem -This theorem states that the distribution of sample means approximates a normal distribution as the sample size becomes large, regardless of the shape of the population distribution.
- Standard error of the mean -The standard error of the mean measures the variability or standard deviation of the sample mean estimate of a population mean. It decreases as the sample size increases.
- Bayesian statistics -Bayesian statistics involves using Bayes' theorem to update the probability of a hypothesis as more evidence or information becomes available.
- Bayesian statistics Course
- Bayes theorem - Bayes' Theorem is a fundamental concept in probability theory. It describes the probability of an event, based on prior knowledge of conditions that might be related to the event. Formally, it calculates the conditional probability of an event A given another event B, based on the inverse probability of B given A and the individual probabilities of A and B. This theorem is widely used in various fields for making inferences and updating probabilities as new evidence is obtained. It's a cornerstone in Bayesian statistics, allowing for a mathematical framework to update beliefs based on new data.
- Bayesian trap
- Margin of error and confidence intervals
- Hypothesis testing and p-values - In hypothesis testing, the p-value is a measure used to assess the strength of the evidence against the null hypothesis. It represents the probability of observing a test statistic as extreme as, or more extreme than, the observed value under the assumption that the null hypothesis is true. A low p-value (typically less than 0.05) indicates that the observed data are unlikely under the null hypothesis and leads to the rejection of the null hypothesis, suggesting that the alternative hypothesis may be true. The p-value is a crucial tool in statistical significance testing, helping to determine whether the results of a study or experiment are likely due to chance or to some actual effect.
- One-tailed and two-tailed tests
- Regression - Regression is a statistical method used to examine the relationship between two or more variables. Essentially, it helps understand how the typical value of the dependent variable changes when any one of the independent variables is varied. In its simplest form, linear regression, it's like fitting a straight line through data points in a way that best expresses the relationship between those points. It's widely used for prediction and forecasting, where it helps in determining how changes in one variable can affect another.
- z-score - A z-score is a measure of how many standard deviations below or above the population mean a raw score is. It's calculated as the difference between a raw score and the population mean, divided by the population standard deviation.
- Correlation
- Covariance
- Sampling methods
- Linear algebra -Linear algebra is a branch of mathematics that deals with vectors, vector spaces, and linear mappings between these spaces. It includes the study of lines, planes, and subspaces, and is fundamental for many areas of mathematics. In linear algebra, the concepts of vector operations, matrices, determinants, eigenvalues, and eigenvectors are key. These concepts are essential for solving systems of linear equations, which is a common problem in various applied fields including computer science, engineering, physics, and economics. Linear algebra provides the language and framework for many modern areas of applied mathematics, including machine learning and data analysis, where it's used to model and solve complex problems.
- Essence of linear algebra
- Linear algebra course khan academy
- Prerequisites recap
- Basics of linear algebra
- Vectors - Vectors are mathematical objects representing quantities that have both magnitude and direction. They can be thought of as arrows pointing from one point to another in space. Vectors are fundamental in physics, engineering, and mathematics, particularly in fields like mechanics and vector calculus. In the context of linear algebra, vectors are often represented as an array of numbers which specify their coordinates in a space, and they are key to understanding concepts like vector spaces, linear transformations, and matrix operations. Vectors are also crucial in machine learning for representing data and model parameters.
- Linear combination and span
- Linear dependence and independence
- Linear subspaces - Linear subspaces, also known simply as subspaces, are subsets of a vector space that are themselves vector spaces. They are defined by a few key properties: they contain the zero vector, they are closed under vector addition (meaning the sum of any two vectors in the subspace is also in the subspace), and they are closed under scalar multiplication (any scalar multiplied by a vector in the subspace results in another vector in the subspace). Subspaces are fundamental in linear algebra and are used to understand more complex structures within vector spaces, like the column space and null space of a matrix.
- Basis of a subspace
- Unit vectors
- Vector dot and cross products -The dot product (or scalar product) of two vectors is a way of multiplying them to get a scalar (a single number).The cross product (or vector product) of two vectors is another way of multiplying them, but unlike the dot product, the result is a vector, not a scalar. The cross product vector is perpendicular to the plane formed by the two original vectors.
- Cauchy-Schwarz inequality
- Vector triangle inequality
- Angle between vectors
- Null space, Column space and dimension of a column space
- More linear algebra concepts
- Different understanding of functions
- Vector transformations -
- Linear transformations - Linear transformations are mappings between vector spaces that preserve the operations of vector addition and scalar multiplication. Essentially, they transform vectors in one space to vectors in another space while maintaining the structure of the vector space
- Projections
- Matrix multiplication as composition
- Intro to inverse of a function
- Matrix inverse
- Determinants -Determinants are a property of square matrices, providing a scalar value that encapsulates certain characteristics of the matrix. The determinant can be interpreted as a scaling factor for the transformation described by the matrix.
- Orthogonal complement
- rank of a matrix
- dim of a matrix
- Change of basis
- The gram-schmidt process
- Eigen values and eigenvectors
- Diagonalization of a matrix
- Tensors explained
- Optimization theory course - Optimization theory involves finding the best solution from all feasible solutions for a given problem. It's used in various fields like economics, engineering, and computer science to maximize efficiency and minimize costs in different scenarios.
- information theory course - Information theory deals with the quantification, storage, and communication of information. It's foundational in areas like data compression, cryptography, and telecommunications, focusing on signal processing and data transmission efficiency.
- econometrics course -Econometrics applies statistical and mathematical methods to economic data to test hypotheses and forecast future trends. It's essential in economics and finance for analyzing economic relationships and informing economic policy decisions.