databricks
There are 743 repositories under databricks topic.
getredash/redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Tencent/APIJSON
🏆 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码,前端(客户端) 定制返回 JSON 的数据和结构。 🏆 A JSON Transmission Protocol and an ORM Library 🚀 provides APIs and Docs without writing any code.
databrickslabs/dolly
Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
tobymao/sqlglot
Python SQL Parser and Transpiler
microsoft/SynapseML
Simple and Distributed Machine Learning
databricks/dbrx
Code examples and resources for DBRX, a large language model developed by Databricks
dotnet/spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
delta-io/delta-rs
A native Rust library for Delta Lake, with bindings into Python
hystax/optscale
FinOps and MLOps platform to run ML/AI and regular cloud workloads with optimal performance and cost.
Multiwoven/multiwoven
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack.
Azure-Samples/modern-data-warehouse-dataops
DataOps for the Modern Data Warehouse on Microsoft Azure. https://aka.ms/mdw-dataops.
mlcraft-io/mlcraft
Synmetrix – open source semantic layer / Boost your LLM precision
databrickslabs/dbx
🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.
databricks/terraform-provider-databricks
Databricks Terraform Provider
thoughtworks/mlops-platforms
Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...
databricks/databricks-sdk-py
Databricks SDK for Python (Beta)
databricks/mlops-stacks
This repo provides a customizable stack for starting new ML projects on Databricks that follow production best-practices out of the box.
databrickslabs/dbldatagen
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
microsoft/nutter
Testing framework for Databricks notebooks
Azure/azure-event-hubs-spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
databrickslabs/overwatch
Capture deep metrics on one or all assets within a Databricks workspace
databrickslabs/cicd-templates
Manage your Databricks deployments and CI with code.
Azure/azure-cosmosdb-spark
Apache Spark Connector for Azure Cosmos DB
adidas/lakehouse-engine
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
databricks/dbt-databricks
A dbt adapter for Databricks.
CartoDB/analytics-toolbox-core
A set of UDFs and Procedures to extend BigQuery, Snowflake, Redshift, Postgres and Databricks with Spatial Analytics capabilities
databricks/terraform-databricks-examples
Examples of using Terraform to deploy Databricks resources
lamastex/scalable-data-science
Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.
databrickslabs/ucx
Your best companion for upgrading to Unity Catalog. UCX will guide you, the Databricks customer, through the process of upgrading your account, groups, workspaces, jobs etc. to Unity Catalog.
aloneguid/stowage
Bloat-free, no BS cloud storage SDK.
aehrc/VariantSpark
machine learning for genomic variants
databricks/databricks-sql-python
Databricks SQL Connector for Python
dataflint/spark
Performance Observability for Apache Spark
databricks/cli
Databricks CLI
DataThirstLtd/azure.databricks.cicd.tools
Tools for Deploying Databricks Solutions in Azure
martandsingh/ApacheSpark
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.