This repo contains a sample CosmosDB SQL API application implemented in Java.
It demonstrates the following:
- Use of a Spring Boot console application
- Use of the Spring Data framework with CosmosDB
- Use of the CosmosDB Asynch SDK
- Use of the CosmosDB Bulk Executor SDK
- Use of the Gradle build tool rather than Maven
- Using a public domain Telemetry dataset - EPA Air Quality
- Use of a Spring Boot web application (future enhancement)
- Spring Boot
- Spring Data
- Spring Data CosmosDB
- CosmosDB Java SDK v4
- CosmosDB Java SDK v4 JavaDocs
- CosmosDB Java SDK v4 Samples
- CosmosDB Java SDK v4 Samples src directory
- CosmosDB Java SDK v4 Best Practices
- CosmosDB Java SDK v4 Performance Tips
- CosmosDB Java SDK @ GitHub
- CosmosDB Java Bulk Executor
- Gradle
- azure-cosmos @ MvnRepository
- Create a CosmosDB SQL API Account. There are several ways to do this; including:
- the Azure Portal UI, az CLI, ARM/Bicep, Azure DevOps, Terraform, etc..
- Create a database within the account; I use the name dev
- Create a container within the account; I use the name telemetry
- use the partition key /pk
- I provisioned 10,000 Request Units (RUs), Autoscale for my container
Set the following environment variables so that the Java code can connect to your account and database.
AZURE_COSMOSDB_SQL_URI
AZURE_COSMOSDB_SQL_RW_KEY1
AZURE_COSMOSDB_SQL_DB
AZURE_COSMOSDB_SQL_REGIONS
AZURE_COSMOSDB_SQL_MAX_DEG_PAR
AZURE_COSMOSDB_SQL_URI is the URI of your account and can be found in the Keys view Azure Portal under your account.
AZURE_COSMOSDB_SQL_RW_KEY1 is the primary read-write key, also found in the Keys view Azure Portal under your account.
AZURE_COSMOSDB_SQL_DB is the database name you created above.
AZURE_COSMOSDB_SQL_REGIONS is a comma separated list of preferred regions
AZURE_COSMOSDB_SQL_MAX_DEG_PAR can be set to -1. If it is set to less than 0, the system automatically decides the number of concurrent operations to run. See https://docs.microsoft.com/en-us/dotnet/api/microsoft.azure.documents.client.feedoptions.maxdegreeofparallelism?view=azure-dotnet
> git clone https://github.com/cjoakim/azure-cosmosdb-java.git
> azure-cosmosdb-java
> gradle build
See build.gradle where you declare your project dependencies:
dependencies {
implementation 'com.azure.spring:spring-cloud-azure-starter-data-cosmos' <---
implementation 'org.apache.commons:commons-csv:1.9.0'
compileOnly 'org.projectlombok:lombok'
developmentOnly 'org.springframework.boot:spring-boot-devtools'
annotationProcessor 'org.projectlombok:lombok'
testImplementation 'org.springframework.boot:spring-boot-starter-test'
}
The CosmosDB SDK v4 library is azure-cosmos and is included in com.azure.spring:spring-cloud-azure-starter-data-cosmos.
To list the JAR files on the Gradle project CLASSPATH, run this command:
> gradle dependencies --configuration runtimeClasspath > data/classpath/runtimeClasspath.txt
Search file for azure-cosmos and you'll find:
| +--- com.azure:azure-cosmos:4.32.0 -> 4.31.0
This dataset is from the United States Environmental Protection Agency on Air Quality. This public-domain dataset is used in this repo for Telemetry data.
See file console_app/data/epa/readme.md regarding how to download this data since it is too large to store in GitHub. It contains approximately 6.5 million rows.
After you download and unzip the file you should have this file relative to where you cloned this GitHub repository to:
console_app/data/epa/8hour_44201_2021/8hour_44201_2021.csv
Note: this file is ignored by git; see the .gitignore file.
See file console_app/src/main/resources/application.properties and make any necessary configuration edits.
Notice how this file sets properties based on the environment variables you set above.
# Spring Data CosmosDB
spring.cloud.azure.cosmos.endpoint=${AZURE_COSMOSDB_SQL_URI}
spring.cloud.azure.cosmos.key=${AZURE_COSMOSDB_SQL_RW_KEY1}
spring.cloud.azure.cosmos.database=${AZURE_COSMOSDB_SQL_DB}
azure.cosmos.maxDegreeOfParallelism=${AZURE_COSMOSDB_SQL_MAX_DEG_PAR}
spring.cloud.azure.cosmos.populate-query-metrics=false
azure.cosmos.queryMetricsEnabled=false
Start in the root directory of the repository on your computer.
> cd .\console_app\
> mkdir tmp
> gradle build
See file build.gradle which defines "tasks" that Gradle can execute. Note how the tasks can pass command-line arguments (args) to the Java program, like this:
task transformRawEpaOzoneData(type: JavaExec) {
classpath = sourceSets.main.runtimeClasspath
mainClass = 'org.cjoakim.cosmos.spring.App'
args 'transform_raw_epa_ozone_data', '0', '50000', 'latLng', '--verbose'
}
> gradle transformRawEpaOzoneData
> gradle loadTelemetryDataWithSpringData
> gradle queryTelemetryWithSpringData
> gradle queryTelemetryWithSynchSdk
> gradle queryTelemetryWithAsynchSdk
> gradle deleteAllDocumentsWithSpringData
> gradle loadEpaOzoneDataWithSdkBulkLoad
As the task names indicate, some of these tasks use Spring Data while others use the CosmosDB SDK - this is intentional as a demonstration of each approach.