- Introduction
- Folder Structure
- Getting Started
- Makefile Usage
- CI/CD Pipeline
- Configuration Files
- Contributing
- License
- Contact
DataTalksClub-Projects is a Python repository aimed at automating the analysis of projects from DataTalksClub courses. It focuses on data from ML Zoomcamp, MLOps Zoomcamp and DE Zoomcamp for the years of 2021-2023. The repository includes Python scripts for tasks like web scraping, data handling, and API interactions. The Data/
folder contains all the datasets I have generated for the courses. It also aims to implement comprehensive tests and data visualizations.
Note: Titles for projects are generated using OpenAI and may require refinement. Future course iterations should include project titles for easier processing.
.
├── Data/ # Data files
├── src/ # Python source files
├── tests/ # Test files (TBD)
├── utils/ # Utility files
├── .env # Environment variables
├── .gitignore # Git ignore rules
├── LICENSE # License file
├── Makefile # Makefile for automation
├── README.md # This file
├── app.py # Streamlit app
├── help.log # Unknown titles
├── pyproject.toml # Build settings
└── requirements.txt # Dependency list
git clone https://github.com/yourusername/DataTalksClub-Projects.git
cd DataTalksClub-Projects
pip install --upgrade pip
pip install -r requirements.txt
To run this project, you'll need to add a .env
file in your project root. Replace your_openai_api_key_here
and your_github_access_token_here
with your actual OpenAI API key and GitHub access token, respectively.
The Makefile included in this repository provides a convenient way to run various tasks. Below are the commands you can use:
This command will run all the unit tests and integration tests for the project.
make test
Run this command to perform code quality checks. It includes isort, black and pylint.
make quality_checks
Use this command to scrape data from specified sources. The data will be saved in the appropriate format and location.
make scrape
This command will generate titles for the projects using OpenAI's API.
make titles
Run this command to check the deployment status of project services such as web, batch or streaming.
make deploy
This command is a shortcut to run all of the above tasks in sequence. It's a quick way to ensure that everything is set up correctly.
make all
Run the Streamlit app using the Makefile
make streamlit
This repository includes a Continuous Integration (CI) workflow that automatically builds and tests the Python project upon each push or pull request. This ensures that the codebase remains stable and free of errors as new changes are integrated.
The CI workflow is configured to perform the following tasks:
- Code quality checks
- Unit tests
- Integration tests
.gitignore
: Specifies files and folders to ignore in Git.LICENSE
: Contains the license information.Pipfile
&Pipfile.lock
: Manage project dependencies.pyproject.toml
: Contains build-related settings.
- Fork the repository.
- Create a new feature branch.
- Make changes.
- Run tests (TBD).
- Submit a pull request.
This project is licensed under the MIT License. See the LICENSE
file for details.
For inquiries, connect with me on Linkedin