Prerequisites:
- Python language
- Modular Programmimg
- Machine learning
- Deep Learning
- GitHub and Environment setup
- Project Structure, logging and exception
- Problem Statements, EDA and Modeling
- Data Ingestion
- Data Tranformation
- Model Training and Evaluation Component
- Model Hyperparameter Tuning
- Predictive Pipeleine using Flask Web app
Setup Project with Github
- Data Ingestion
- Data Transformation
- Model Trainer
- Model Evaluation
- Model Deployment
CI/CD Pipelines - GitHub Actions Deployment
-
Setup the GitHub {Repository} a) new environment b) setup.py c) requirements.txt
-
src folder and build the package
-
Create a GitHub Repository 1.1 Keep it public (as per project requirement) 1.2 Name the project 'ml project'
-
Create a local repository 'mlproject' and open it in text editor like VS Code
-
Create and Activate an environment 3.1 In terminal create environment
python conda create -p venv python==3.8 -y
3.2 In terminal activate environmnetpython conda activate venv/
-
Create a 'README.md' file in 'ml project' folder
-
Sync the GitHub and Local Repositories and push the code to GitHub Repo 5.1 git init in the activated environmnet 5.2 Check Git Configuration if not configured then configure it.
python git config --global user.name "<Name>" git config --global user.email <email with GitHub Account>
CHECK GIT CONFIGURATIONpython git config --global user.name git config --global user.email
5.3 Set Remote Repositoriespython git branch -M main git remote add origin <url>
Check Remote Repo:python git remote -v
5.4 Push README.md to GitHubpython git status git add README.md git commit -m 'README.md added' git status git remote -v git push -u origin main
-
Create '.gitignore' file in Python in GitHub directly 6.1 Want to use a .gitignore template? Python
-
Git Pull to make sure code is in sync between GitHub Repo and Base Repo
-
Create 'setup.py' file in 'ml project' folder 8.1 with the setup.py we will be able to build our entire machine learning application as a package and even deploy in PiPi 8.2 Create a function 'get_requirements'
-
Create 'requirements.txt' file 9.1 Inside pandas numpy seaborn -e .
-
Create 'src' folder in 'ml project' folder 10.1 Inside src folder create 'init.py' file
-
In terminal
pip install -r requirements.txt
-
'mlproject.egg-info' indicated that your package is getting installed.
-
push all the files to GitHub.
- mlproject.egg-info
- src
- venv
- .gitignore
- README.md
- requirements.txt
- setup.py
- Project Structure
- Logging
- Exception Handling
Follow me for upcoming posts on: LinkedIn: https://lnkd.in/dfgMwFGT Code: https://lnkd.in/dG9YDXru
Entire project implementation will be happening inside the Source Folder.
-
Create the "component" folder inside the 'src' folder. Inside the "component" folder, create: 1.1 'init.py' 1.2 data_ingestion.py 1.3 data_tranformation.py 1.4 model_trainer.py Here, we can add "data_validation.py" and "model_evaluation.py"
-
Create "pipeline" Folder inside "src" folder. Inside "pipeline" folder create: 2.1 init.py 2.2 train_pipeline.py 2.3 predict_pipeline.py
-
Create three important files inside "src" folder. 3.1 logger.py 3.2 exception.py: Create custom exception. 3.3 utils.py
-
Prepare the logging.py file to log the actions
-
utils.py file will be prepared in next part PART 3
-
Commit the changes in GitHub
Files in "ml project" folders:
- mlproject.egg-info
- src
- venv
- .gitignore
- README.md
- requirements.txt
- setup.py
- Project Structure
- Exploratory Data Analysis
- Model Training
Upcoming Tutorial 4: Data Ingestion Implementation
Learned from : Krish Naik sir
Steps followed in Tutorial 3:
-
Create the "notebook" folder in "mlproject" Note: Make sure you are in the created environment "venv"
-
Inside "notebook" folder create two .ipynb files:
- EDA
- MODEL TRAINING Note: Jupyter Notebook is the best for performing EDA. In upcoming tutorials modular programming will be performed for Model Training.
-
Undersatnd the problem statement and collect the data accordingly. Note: Here we have used the kaggle dataset for practicing. In real life, we need to collect data from scrapping or load data from databases like MongoDB.
-
Performed Exploratory Data Analysis on dataset Note: Focus on the problem statement and accordingly perform the required checks on dataset in order to understand the data.
- We need to install the ipykernal, this package provides the IPython kernel for Jupyter.
-
Performed Model Training Note: Use .pynb file for ease later tutorials we will convert the code into the modular coding.
-
Make sure you have comment down the required libraries in the requirements.txt and install them in the created environment Note: comment the .e in requirement.txt as we will create our package in the last then un-comment .e
-
Commit all the changes to the GitHub.
- Data Ingestion Implementation Note:
- Creating class DataIngestionConfig(data paths where the files are saved.)
- Creating class DataIngestion (inheriting the class DataIngestionConfig and creating methods to initiate_data_ingestion)
- Data Tranformation Implementation
- In data_tranformation.py,
- creating a class DataTransformationConfig to set the path to save the pickel file and
- creating class DataTransformation to initiate the transformation by inheriting the DataTransformationConfig and functions get_data_transformer_object and initiate_data_transformation
- In util.py, creating a function save_object to dump the file
- Combine data_tranformation.py with data_ingestion.py
- Add dill libbrary in requirements.txt and run pip install -r requirements.txt
- Model Training and Evaluation Component
- Preparing the "model_trainer.py"file
- importing all the required modules and libraries
- Creating ModelTrainerConfig class to set the path for saving the model file
- Creating ModelTrainer class for model training
- inhering the path of the model file from the ModelTrainerConfig
- Create a class to initiate the training, evaluation of model, and saving the object in the model file using the save_object and evaluate_model functions from utils file
- returning the performance metrics of the best model with the name of the same.
- Create the functions in utils file to to dump the object to a file and to evaluate the models performance.
- import the model training file in ingestion file and run the data ingestion file
- Commit the changes to the GitHub repository
- Model Hyperparameter Tuning
- Predictive Pipeleine using Flask Web app