This repository contains sample Databricks notebooks found within the Databricks Selected Notebooks Jump Start and other miscellaneous locations.
The notebooks were created using Databricks in Python, Scala, SQL, and R; the vast majority of them can be run on Databricks Community Edition (sign up for free access via the link).
###On-Time Flight Performance
On-Time Flight Performance with GraphFrames for Apache Spark: Provides a jump start into Graph using GraphFrames
for Apache Spark on flight departure performance data.
ADAM Genomic Analysis using K-Means Clustering: Applying k-means clustering
to predict population sample location based on genomic sequences using ADAM
.
Streaming Meetup RSVPs is a series of notebooks showcasing how streaming on Databricks including the use of DataFrames
and mapWithState
.
-
adam: Genomic Sequencing using Apache Spark and ADAM
-
blog books: Notebooks to support the Databricks blog ebooks.
-
content: Various notebooks including
- Data Exploration on Databricks
- Salesforce Leads with Machine Learning, Spark SQL, and UDFs
- Streaming Meetup RSVPs
-
demo: Various notebooks including
- OR Block Scheduling using Linear Regression
- Mobile Sample SQL Notebook
- Population vs. Price Linear Regression and SQL notebooks
- Spark 1.6 Notebooks (describing the various enhancements for Spark 1.6)
-
dogfood: Various notebooks including
- AdTech Sample Notebook
- Quick Start using Python | Scala
-
examples: Example notebooks in various stages of completion including
Iris dataset k-means vs. bisecting k-means
-
flights: Various notebooks working with on-time flight performance
-
reporting: Example reporting notebooks including dashboard views