Welcome to the Statistics for Data Science repository! This repository aims to provide a comprehensive collection of resources, ipynb-tuts, and examples related to statistical techniques and their application in data science. Whether you are a beginner or an experienced data scientist, you'll find valuable content to enhance your understanding of statistics and their role in data-driven decision making.
- Descriptive Statistics
Mean
Median
Mode
Standard Deviation
Variance
Covariance
Correlation
Quartiles
IQR
Box Plots
- Inferential Statistics
- Probability Distributions
Binomial
Poisson
Normal
- Sampling
Confidence Intervals
Sample Size Selection
Statistical Significance
- Hypothesis Testing
Z-tests
T-tests
ANOVA
Chi-Square tests
- Probability Distributions
- Regression
Simple Linear Regression
Multiple Linear Regression
Polynomial Regression
Logistic Regression
- Time Series
Trend
Seasonality
Autocorrelation
ARIMA Models
- Statistical Visualization
Histograms
Density Plots
Q-Q Plots
Pair Plots
Correlation Heatmaps
Some useful resources for statistics in data science:
- Kaggle Learn Statistics Course
- Khan Academy Statistics and Probability
- Introduction to Statistical Thought by Michael Lavine
- Think Stats by Allen B. Downey
- YT Krish Naik
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurelien Geron
- StatQuest with Josh Starmer
This repository is open source and contributions are welcome. If you have any ideas for hacks or tips, or if you find any errors, please feel free to open an issue or submit a pull request.
This repository is licensed under the MIT License.