/pyspark-stock-exchange-analysis

📈 Stock Exchange data analysis using PySpark

Primary LanguageJupyter Notebook

PySpark Stock Exchange Data Analysis

📈 Stock Exchange data analysis using PySpark.

Notebook Printscreen

Context

This project is my "hello world" with PySpark, originated from a practice activity of IGTI's Data Science Bootcamp. It consists in application of PySpark to perform a simple data analysis using a Stock Exchange dataset, which was extracted from this Kaggle repository.

Reproducing analysis

Running on Colab

The easiest way to reproduce this analysis is by using Google Colab. You will just need to import quiz_colab.ipynb and all_stocks_5yr.csv files in a new Colab's session and run it.

Running locally

I choose to create a environment with Jupyter and Spark in my local machine using a Docker Compose file, which uses Jupyter PySpark Notebook image. Details of Docker Compose installation can be found on it's official documentation.

Since you have it installed in your machine, all you need is run the following commands in a terminal window:

  1. Clone repository:
$ git clone https://github.com/lucasfusinato/pyspark-stock-exchange-analysis
  1. Open project's folder:
$ cd pyspark-stock-exchange-analysis
  1. Start containers:
$ docker-compose up -d

And that's all! Now, you should be able to access notebook (and also running it by yourself) by clicking on this link.

Built with

  • Docker Compose: Docker container's run specification tool;
  • Jupyter: notebook execution environment;
  • Spark: engine for large-scale data analytics;
  • PySpark: interface for Apache Spark in Python.