This library contains the source code for Azure Data Explorer Data Source and Data Sink Connector for Apache Spark.
Azure Data Explorer (A.K.A. Kusto) is a lightning-fast indexing and querying service.
Spark is a unified analytics engine for large-scale data processing.
Making Azure Data Explorer and Spark work together enables building fast and scalable applications, targeting a variety of Machine Learning, Extract-Transform-Load, Log Analytics and other data driven scenarios.
This is a beta release of Azure Data Explorer connector for Spark. It exposes Azure Data Explorer as a valid Data Store for standard Spark source and sink operations such as write, read and writeStream.
For main changes from previous releases and known issues please refer to CHANGELIST
For Scala/Java applications using Maven project definitions, link your application with the artifact below in order to use the Azure Data Explorer connector for Spark.
groupId = com.microsoft.azure
artifactId = spark-kusto-connector
version = 1.0.0-Beta-03
In Maven:
Note that the jar is in beta and not available yet in public maven. Clone this repository and build it locally to add it to your local maven repository, or use the corresponding released package
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>spark-kusto-connector</artifactId>
<version>1.0.0-Beta-03</version>
</dependency>
Samples are packaged as a separate module with the following artifact
<artifactId>connector-samples</artifactId>
In order to build the whole project comprised of the connector module and the samples module, use the following artifact:
<artifactId>azure-kusto-spark</artifactId>
In order to use the connector, you need to have:
- Java 1.8 SDK installed
- Maven 3.x installed
- Spark version 2.4.0 or higher
Note: when working with 2.3 Spark version or lower, please refer to Building for legacy Spark versions section of the CHANGELIST document
// Builds jar and runs all tests
mvn clean package
// Builds jar, runs all tests, and installs jar to your local maven repository
mvn clean install
In order to facilitate ramp-up on platforms such as Azure Databricks, pre-compiled libraries are published under GitHub Releases. These libraries include:
- Azure Data Explorer connector library
- May also include Kusto Java data and ingestion client libraries (kusto-data and kusto-ingest)
Spark Azure Data Explorer connector takes dependency on Azure Data Explorer Data Client Library and Azure Data Explorer Ingest Client Library, available on maven repository. When Key Vault based authentication is used, there is an additional dependency on Microsoft Azure SDK For Key Vault.
Note: When working with Databricks, Azure Data Explorer connector requires Azure Data Explorer java client libraries (and azure key-vault library if used) to be installed. This can be done by accessing Databricks Create Library -> Maven and specifying the following coordinates:
- com.microsoft.azure.kusto:kusto-data:1.0.0-BETA-04
- com.microsoft.azure.kusto:kusto-ingest:1.0.0-BETA-04
Detailed documentation can be found here.
Usage examples can be found here
Here is a list of currently available client libraries for Azure Data Explorer:
- Have a feature request for SDKs? Please post it on User Voice to help us prioritize
- Have a technical question? Ask on Stack Overflow with tag "azure-data-explorer"
- Need Support? Every customer with an active Azure subscription has access to support with guaranteed response time. Consider submitting a ticket and get assistance from Microsoft support team
- Found a bug? Please help us fix it by thoroughly documenting it and filing an issue.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.