This repository contains two exercises of Spark using databricks.
- databricks
Databricks builds on top of Spark and adds:
Highly reliable and performant data pipelines
Productive data science at scale
For more informations about spark,click here For more informations about databricks, click here
- Click this linkto download databricks community edition (for free) and sign up
- download the repository in order to work with the notebook
$ git clone https://github.com/chaoyingc/Spark.git
3, import pyspark.ipynb, dataframe.ipynb and cours1.parq files in databricks so you can work on them
Here is a simple tutorial for pyspark that allows you work on data with python, you can also work with scala or java if you are familier with them.