Azure Spring Apps File Processing Sample

1. Scenario

1.1. Log Files Explanation

A system generates log files in folders named by date: /var/log/system-a/${yyyy-MM-dd}.
The log files are txt files named by hour and minute: ${hh-mm}.txt.
Each line of the log files will have format like this: ${name,favorite_color,favorite_number}.

Here is a picture about folder structure and log file:

1.2. File Processing Requirements

1.2.1. Functional Requirements

Before app starts, the number of files to be processed in unknown.
- All files in specific folder should be processed.
- All files in sub folder should be processed, too.

Each line should be converted into an avro object. Here is the format of avro:

{
  "namespace": "com.azure.spring.example.file.processing.avro.generated",
  "type": "record",
  "name": "User",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "favorite_number",
      "type": [
        "int",
        "null"
      ]
    },
    {
      "name": "favorite_color",
      "type": [
        "string",
        "null"
      ]
    }
  ]
}

Send the avro object to Azure Event Hubs.
After file processed, move the file to another folder to avoid duplicate processing. Here is an example of move target folder: /var/log/system-a-processed/${yyyy-MM-dd}

Here is a picture about moving log files after processed:

1.2.2. Non-Functional Requirements

The system must be robust.
- 1.1. Handle invalid file. Current application only handle txt files. For other file types like csv, it will be filtered out.
- 1.2. Handle invalid line. When there is a invalid data line in a file, output a warning log then continue processing.
Easy to track.
- 2.1. When there is invalid line, the log should contain these information:
  - Which file?
  - Which line?
  - Why this line is invalid?
  - What is the string value of this line?
- 2.2. Track each step of a specific file.
  - Does this file be added in to processing candidate?
  - If the file is filtered out, why it's filtered out?
  - How many line does this file have?
  - Does each line of this file been converted to avro object and send to Azure Event Hubs successfully?

1.3. System Diagram

Azure Spring Apps: Current application will run on Azure Spring Apps.
Azure Storage Files: Log files stored in Azure Storage files.
Azure Event Hubs: In log files, each valid line will be converted into avro format then send to Azure Event Hubs.
Log Analytics: When current application run in Azure Spring Apps, the logs can be viewed by Log Analytics.

1.4. The Application

The scenario is a classic Enterprise Integration Pattern, so we use the Spring Boot + Spring Integration in this application.

2. Run Current Sample on Azure Spring Apps Consumption Plan

2.1. Provision Required Azure Resources

Provision an Azure Spring Apps instance. Refs: Quickstart: Provision an Azure Spring Apps service instance.
Create An app in created Azure Spring Apps.
Create an Azure Event Hub. Refs: Create an event hub using Azure portal.
Create Azure Storage Account. Refs: Create a storage account.
Create a File Share in created Storage account.
Mount Azure Storage into Azure Spring Apps to /var/log/. Refs: How to enable your own persistent storage in Azure Spring Apps.

2.2. Deploy Current Sample

Set these environment variables for the app.

logs-directory=/var/log/system-a
processed-logs-directory=/var/log/system-a-processed
spring.cloud.azure.eventhubs.connection-string=
spring.cloud.azure.eventhubs.event-hub-name=

Upload some sample log files into Azure Storage Files. You can use files in ./test-files/var/log/system-a.
Build package.
```
./mvnw clean package
```
Set necessary environment variables according to the created resources.
```
RESOURCE_GROUP=
AZURE_SPRING_APPS_INSTANCE=
APP_NAME=
```

Deploy app

az spring app deploy \
  --resource-group $RESOURCE_GROUP \
  --service $AZURE_SPRING_APPS_INSTANCE \
  --name $APP_NAME \
  --artifact-path target/azure-spring-apps-file-processing-sample-0.0.1-SNAPSHOT.jar

After deployed successfully, you can see logs like this:

2.3. Check Details About File Processing

Check logs by Azure Toolkit for IntelliJ.

Screenshot:
Get log of specific error.

Input Convert txt string to User failed in search box: Error details can be found in the log:
Get all logs about a specific file.

Input /var/log/system-a/00-00.txt in search box:

Input /var/log/system-a/00-03.csv in search box:
Check events in Azure Event Hubs.

Check events by Azure Toolkit for IntelliJ:

Another way to check events is using ServiceBusExplorer. It can give more information about message properties:

3. Next Steps

3.1. Store Secrets in Azure Key Vault Secrets

Secret can be stored in Azure Key Vault secrets and used in this application. spring-cloud-azure-starter-keyvault is a useful library to get secrets from Azure KeyVault in Spring Boot applications. And spring-cloud-azure-starter-keyvault supports refresh the secrets in a fixed interval.

The following are some examples of using secrets in current application:

Azure Event Hubs connection string.
Passwords to access file. All passwords can be stored in a key-value map. Here is example of such map:
```
[
  {"00-00.txt": "password-0"},
  {"00-01.txt": "password-1"},
  {"00-02.txt": "password-2"}
]
```

3.2. Auto Scaling

3.2.1. Scale 0 - 1

Design
1. Scale to 0 instance when:
  - There is no file need to be handled for more than 1 hour.
2. Scale to 1 instance when one of these requirements satisfied:
  - File exists for more than 1 hour.
  - File count > 100.
  - File total size > 1 GB.
Implement: Use Azure Blob Storage instead of Azure File Share. So related KEDA Scaler can be used.

3.2.2. Scale 1 - n

Design: Scale instance number according to file count and total file size.
Implement: To avoid competition between instances, use some proven technology like Master/slave module.

chenrujun/azure-spring-apps-file-processing-sample