Databrick

Facts

  • Databricks is an American enterprise software company founded by the creators of Apache Spark.
  • Databricks develops a web-based platform for working with Spark, that provides automated cluster management and IPython-style notebooks.
  • Databricks is a managed Spark-based service for working with data in a cluster
  • Databricks supports multiple languages but you’ll always get the best performance with JVM-based languages (SQL/Java are preferable over R/Python).
  • Databricks has 3 In-Memory Data Object APIs: RDDs, Dataframes, and Datasets.
  • A Spark Dataframe is not the same as a Pandas/R Dataframe: Spark Dataframes are specifically designed to use distributed memory to perform operations across a cluster whereas Pandas/R Dataframes can only run on one computer.
  • DataBricks was founded to provide an alternative to the MapReduce system and provides a just-in-time cloud-based platform for big data processing clients.

References