/end-to-end-data-engineering-project

This repo is for the Linkedin Learning course: End-to-End Data Engineering Project

Primary LanguagePythonOtherNOASSERTION

End-to-End Data Engineering Project

This is the repository for the LinkedIn Learning course End-to-End Data Engineering Project. The full course is available from LinkedIn Learning.

End-to-End Data Engineering Project

The world of data engineering is ever-changing, with new tools and technologies emerging on a regular basis. Building an effective analytics platform can be a daunting task, especially if you’re not familiar with all the tools available. How do you turn scattered, complex data into a model that drives insights and decision-making? In this course, Thalia Barrera teaches data professionals how to implement an end-to-end data engineering project using open tools from the modern data stack. She touches on best practices such as data modeling, testing, documentation and version control and shows you how to efficiently extract, load, and transform data into a unified, analytics-ready format. Thalia shows you how to confidently select and use tools through practical examples—taking you through the construction of a robust data pipeline for a fictional ecommerce company—and how to implement best practices in data engineering.

Instructions

This repository has two branches: main holds the initial state of the project, and finished holds the final state. You can use the branch pop up menu in github to switch to a specific branch and take a look at the course at that stage, or you can add /tree/BRANCH_NAME to the URL to go to the branch you want to access.

Branches

You will be working in the main branch throughout the course. At any time, you can checkout the finished branch to consult how the finished project looks like.

Prerequisites

Ensure you have Python 3 installed. If not, you can download and install it from Python's official website.

Installing

  1. Fork the Repository:
    • Click the "Fork" button on the top right corner of this repository.
  2. Clone the repository:
    • git clone https://github.com/YOUR_USERNAME/end-to-end-data-engineering-project-4413618.git
    • Note: Replace YOUR_USERNAME with your GitHub username
  3. Navigate to the directory:
    • cd end-to-end-data-engineering-project-4413618
  4. Set Up a Virtual Environment:
    • For Mac:
      • python3 -m venv venv
      • source venv/bin/activate
    • For Windows:
      • python -m venv venv
      • .\venv\Scripts\activate
  5. Install Dependencies:
    • pip install -e ".[dev]"

Instructor

Thalia Barrera

Check out my other courses on LinkedIn Learning.