Data Engineering Workshop

One Day workshop on understanding Docker, Web Scrapping, Regular Expressions, PostgreSQL and Git.

Prerequisites

Any Linux machine/VM with following packages installed

  • Python 3.6 or above
  • docker
  • docker-compose
  • pip3
  • git (any recent version)

GitHub account

  • Create an account on GitHub (Only if you do not have an account)
  • Fork this repository and then clone it to your machine
  • You can refer this guide to understand how to fork and clone

Docker

  • To install docker go to your cloned repository and run the following command
  • sudo prerequisites/install_docker.sh

Workshop environment setup

  • Check if Git, Docker, and Docker Compose are installed in on the system. Open the terminal and run the following command
    Command: $ git --version
    git version 2.25.1
    
    Command: $ docker --version
    Docker version 20.10.17, build 100c701
    
    Command: $ docker-compose --version
    docker-compose version 1.25.0, build 0a186604
    
    

What will you learn by the end of this workshop?

  • By the end of this workshop you will learn how to build docker image and it's usage.
  • You will learn how to scrape a website using urllib/requests and Beautifulsoup.
  • You will learn Regular Expressions and how to work with it.
  • You will learn key features of PostgreSQL.
  • You will learn how to dockerize your project.

Schedule

Time Topics
09:00 - 11:00 Introduction to Docker
11:00 - 01:00 Introduction to Webscrapping.
01:00 - 02:00 Break
02:00 - 03:00 Introduction to PostgreSQL
03:00 - 04:00 Dockerizing a project
04:00 - 04:30 Introduction to Github
04:30 - 04:45 Q & A
04:45 - 05:00 Wrapping Up