- Databricks is an American enterprise software company founded by the creators of Apache Spark.
- Databricks develops a web-based platform for working with Spark, that provides automated cluster management and IPython-style notebooks.
- Databricks is a managed Spark-based service for working with data in a cluster
- Databricks supports multiple languages but you’ll always get the best performance with JVM-based languages (SQL/Java are preferable over R/Python).
- Databricks has 3 In-Memory Data Object APIs: RDDs, Dataframes, and Datasets.
- A Spark Dataframe is not the same as a Pandas/R Dataframe: Spark Dataframes are specifically designed to use distributed memory to perform operations across a cluster whereas Pandas/R Dataframes can only run on one computer.
- DataBricks was founded to provide an alternative to the MapReduce system and provides a just-in-time cloud-based platform for big data processing clients.