Data Science with Python
https://drawdata.xyz/
drawdata module
https://files.zillowstatic.com/research/public_csvs/zhvi/Metro_zhvi_uc_sfrcondo_tier_0.33_0.67_sm_sa_month.csv
https://github.com/CodeSolid/CodeSolid.github.io/raw/main/booksource/data/AnalyticsSnapshot.xlsx
-
Python Fundamentals & Jupyter notebook
-
Data Science Libraries
- Data Mining
- Scrapy
- BeautifulSoup
- Data Processing & Modelling
- Numpy
- SciPy
- Pandas
- Keras
- Scikit-learn
- PyTorch
- TensorFlow
- XgBoost
- NLTK
- Gensim
- Data Visualization
- Matplotlib
- Seaborn
- Bokeh
- Plotly
- pydot
- Data Mining
-
Machine Learning
a. Supervised Learning
-
Classification is used to predict the outcome of a given sample when the output variable is in the form of categories. A classification model might look at the input data and try to predict labels like “sick” or “healthy.”
-
Regression is used to predict the outcome of a given sample when the output variable is in the form of real values. For example, a regression model might process input data to predict the amount of rainfall, the height of a person, etc. Ex: Linear Regression, Logistic Regression, CART, Naïve-Bayes, and K-Nearest Neighbors (KNN) — are examples of supervised learning.
-
Ensembling is another type of supervised learning. It means combining the predictions of multiple machine learning models that are individually weak to produce a more accurate prediction on a new sample. Algorithms 9 and 10 of this article — Bagging with Random Forests, Boosting with XGBoost — are examples of ensemble techniques.
b. Unsupervised Learning
-
Association is used to discover the probability of the co-occurrence of items in a collection. It is extensively used in market-basket analysis. For example, an association model might be used to discover that if a customer purchases bread, s/he is 80% likely to also purchase eggs.
-
Clustering is used to group samples such that objects within the same cluster are more similar to each other than to the objects from another cluster.
-
Dimensionality Reduction is used to reduce the number of variables of a data set while ensuring that important information is still conveyed. Dimensionality Reduction can be done using Feature Extraction methods and Feature Selection methods. Feature Selection selects a subset of the original variables. Feature Extraction performs data transformation from a high-dimensional space to a low-dimensional space. Example: PCA algorithm is a Feature Extraction approach.
c. Reinforcement Learning
-
-
Mini-Projects
a. Data Cleaning Project
b. Data Visualization Project
c. Machine Learning Project
csv -> load pandas df --> data cleansing --> tranformations -> golden record in db
-> data visualation
- Python --> Matplotlib/Seaborn/Plotly
- javascript --> flask/Django/fastApi ---> d3.js/highcharts
Kaggle
clasfication preprpocessing
Learning Data Science
ml Version Control System https://dvc.org/