oxford-ai-gcp

This is the supporting material for my talk "Data Engineering on GCP" presented at Oxford University as part of the course Artificial Inteligence: Cloud and Edge Implementations.

Presentation Slides

Available here.

This repo has been tested on Linux Ubuntu and Mac OS X.

If you are using Windows 10 you may use Ubuntu through the Windows Subsystem for Linux.

The following steps assume you have python3 installed.

Install Java 8 (required for running PySpark locally). On Linux Ubuntu:

sudo apt install openjdk-8-jdk

Create a python3 virtual env before running any of the sample code:

python3 -m venv venv

If the module python3-venv is not available, you may need to install it:

sudo apt-get install python3-venv

TBD

To activate the environment, use:

source venv/bin/activate

With the virtual env activated, install the requirements file:

pip install -r requirements.txt

To deactivate the environment, after finishing your work, use the command deactivate.