Azure Databricks: A Brief Introduction
https://www.youtube.com/watch?v=cxyUy1bZ9mk
Intro to Machine Learning for Developers on Azure Databricks
https://databricks.com/intro-to-machine-learning-for-developers-on-azure-databricks
This repo is dedicated to Microsoft Azure Databricks sample codes.
https://github.com/caiomsouza/microsoft-azure-databricks-playground/blob/master/videos.md
https://notebooks.azure.com/caiomsouza/libraries/Azure-MachineLearningNotebooks/tree/databricks
I started in the Big Data World some years ago using pure Apache Software, then Hortonworks and Cloudera. In the last years I really enjoyed to work with HDFS, MapReduce I and II, Storm, Pig, Hive, Cloudera Impala, Spark, etc.
Since I joined the Microsoft world in April 2018, I started looking with my open source eyes to Microsoft Azure offers to deliver Big Data Science projects and every day I like more and more the Azure Databricks offer.
I am very happy to see Microsoft moving each day more and more to Azure Databricks World (Apache Spark, Python, R, Scala and all open source technologies). The combination of Microsoft and Databricks is incredible. Great product and support from Microsoft and Databricks.
Basically with Azure Databricks you have into one single product the power to run big data jobs, implement machine learning (Python or R) using a notebook.
You can run Azure Databricks Notebooks direct as a Job in Azure Databricks or schedule it in Azure Data Factory.
Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services platform. Designed with the founders of Apache Spark, Databricks is integrated with Azure to provide one-click setup, streamlined workflows, and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.
Introduction to Azure Databricks
https://www.slideshare.net/jamserra/introduction-to-azure-databricks-83448539
Benchmarking Big Data SQL Platforms in the Cloud
TPC-DS benchmarks demonstrate Databricks Runtime 3.0’s superior performance
https://databricks.com/blog/2017/07/12/benchmarking-big-data-sql-platforms-in-the-cloud.html
Spark: Cluster Computing with Working Sets
http://static.usenix.org/legacy/events/hotcloud10/tech/full_papers/Zaharia.pdf
Improving MapReduce Performance in Heterogeneous Environments
http://static.usenix.org/event/osdi08/tech/full_papers/zaharia/zaharia.pdf
MLlib: Machine Learning in Apache Spark
http://www.jmlr.org/papers/volume17/15-237/15-237.pdf
https://databricks.com/sparkaisummit/europe/spark-summit-2018-keynotes
https://github.com/hipic/biz_data_LA
https://docs.azuredatabricks.net/spark/latest/training/index.html