databricks
There are 1007 repositories under databricks topic.
getredash/redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
cube-js/cube
📊 Cube’s universal semantic layer platform is the next evolution of OLAP technology for AI, BI, spreadsheets, and embedded analytics
Tencent/APIJSON
🏆 实时 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码,前端(客户端) 定制返回 JSON 的数据和结构 🏆 Real-Time coding-free, powerful and secure ORM 🚀 providing APIs and Docs without coding by Backend, and the returned JSON of API can be customized by Frontend(Client) users
databrickslabs/dolly
Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
tobymao/sqlglot
Python SQL Parser and Transpiler
microsoft/SynapseML
Simple and Distributed Machine Learning
delta-io/delta-rs
A native Rust library for Delta Lake, with bindings into Python
databricks/dbrx
Code examples and resources for DBRX, a large language model developed by Databricks
dotnet/spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Multiwoven/multiwoven
🔥🔥🔥 Open source Reverse ETL - alternative to hightouch and census.
Azure-Samples/modern-data-warehouse-dataops
DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo
synmetrix/synmetrix
Synmetrix – production-ready open source semantic layer on Cube
databricks/terraform-provider-databricks
Databricks Terraform Provider
databricks/mlops-stacks
This repo provides a customizable stack for starting new ML projects on Databricks that follow production best-practices out of the box.
databricks/databricks-sdk-py
Databricks SDK for Python (Beta)
databrickslabs/dbx
🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.
databrickslabs/dbldatagen
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
thoughtworks/mlops-platforms
Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...
databrickslabs/dqx
Databricks framework to validate Data Quality of pySpark DataFrames
dataflint/spark
Drop-in replacement for Apache Spark UI
microsoft/nutter
Testing framework for Databricks notebooks
databricks/dbt-databricks
A dbt adapter for Databricks.
databricks/terraform-databricks-examples
Examples of using Terraform to deploy Databricks resources
databrickslabs/ucx
Automated migrations to Unity Catalog
adidas/lakehouse-engine
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
Azure/azure-event-hubs-spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
databrickslabs/overwatch
Capture deep metrics on one or all assets within a Databricks workspace
databrickslabs/dlt-meta
Metadata driven Databricks Lakeflow Declarative Pipelines framework for bronze/silver pipelines
databricks/databricks-sql-python
Databricks SQL Connector for Python
databrickslabs/cicd-templates
Manage your Databricks deployments and CI with code.
Azure/azure-cosmosdb-spark
Apache Spark Connector for Azure Cosmos DB
CartoDB/analytics-toolbox-core
A set of UDFs and Procedures to extend BigQuery, Snowflake, Redshift, Postgres and Databricks with Spatial Analytics capabilities
databricks/cli
Databricks CLI
aloneguid/stowage
Bloat-free, no BS cloud storage SDK.
buremba/universql
The bridge to effortless multi-engine data applications, currently supports Snowflake ❄️ and DuckDB 🦆
lamastex/scalable-data-science
Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.