/Spark

This repository contains two exercises of Spark using databricks.

Primary LanguageJupyter Notebook

Spark

This repository contains two exercises of Spark using databricks.

requirements:

  • databricks

Databricks

Databricks builds on top of Spark and adds:
Highly reliable and performant data pipelines
Productive data science at scale databricks

For more informations about spark,click here For more informations about databricks, click here

Installation Databricks CE

  1. Click this linkto download databricks community edition (for free) and sign up
  2. download the repository in order to work with the notebook
$ git clone https://github.com/chaoyingc/Spark.git

3, import pyspark.ipynb, dataframe.ipynb and cours1.parq files in databricks so you can work on them
Here is a simple tutorial for pyspark that allows you work on data with python, you can also work with scala or java if you are familier with them.