janmg/logstash-input-azure_blob_storage

Facing issue in reading SignIn Logs from Azure Blob Storage

neeleshg opened this issue · 3 comments

Hi
I am trying to push Azure SignIn Logs to Graylog using Logstash.
I have exported Azure SignIn Logs to Azure Blob Storage.
I have also configured azure_blob_storage input plugin in Logstash.
Logstash is running properly, however it is not sending some of the logs or missing some of the SignIn Logs, though theyu are there in Blob Storage.
When I checked logs, noticed that there are continous errors regarding partial_read.
Error:

[2023-07-07T03:50:07,756][ERROR][logstash.javapipeline    ][main][39346dc9201d4b47a873dce433876729e693e1b862a5cab7a602272273e1c31c] A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id:main
  Plugin: <LogStash::Inputs::AzureBlobStorage container=>"insights-logs-managedidentitysigninlogs", registry_local_path=>"/usr/share/logstash/pipeline/reg_managedidentitysigninlogs", codec=><LogStash::Codecs::JSON id=>"json_f03b1fcf-b19e-4148-8873-c1be979e0c1b", enable_metric=>true, charset=>"UTF-8">, path_filters=>["**/*.json"], prefix=>"tenantId=<TENANT_ID>", registry_create_policy=>"resume", interval=>20, skip_learning=>true, dns_suffix=>"core.usgovcloudapi.net", id=>"<ID>", connection_string=><password>, enable_metric=>true, logtype=>"raw", registry_path=>"data/registry.dat", addfilename=>false, addall=>false, debug_until=>0, debug_timer=>false, file_head=>"{\"records\":[", file_tail=>"]}">
  Error: InvalidBlobType (409): The blob type is invalid for this operation.
RequestId:XXXXXXXXXXXXXXXX
Time:2023-07-07T03:50:07.7508606Z
  Exception: Azure::Core::Http::HTTPError
  Stack: /usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/http/http_request.rb:154:in `call'
org/jruby/RubyMethod.java:116:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/http/signer_filter.rb:28:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/http/http_request.rb:111:in `block in with_filter'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/service.rb:36:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/filtered_service.rb:34:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/signed_service.rb:41:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/storage/common/service/storage_service.rb:60:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-blob-2.0.3/lib/azure/storage/blob/blob_service.rb:179:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-blob-2.0.3/lib/azure/storage/blob/block.rb:276:in `list_blob_blocks'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-input-azure_blob_storage-0.12.7/lib/logstash/inputs/azure_blob_storage.rb:413:in `partial_read'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-input-azure_blob_storage-0.12.7/lib/logstash/inputs/azure_blob_storage.rb:271:in `block in run'
org/jruby/RubyHash.java:1519:in `each'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-input-azure_blob_storage-0.12.7/lib/logstash/inputs/azure_blob_storage.rb:246:in `run'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:414:in `inputworker'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:405:in `block in start_input'

Here is my logstash configuration:

input {
     azure_blob_storage {
         connection_string => "DefaultEndpointsProtocol=https;AccountName=<BLOB_STORAGE_ACCOUNT>;AccountKey=<ACCOUNT_KEY>;EndpointSuffix=core.usgovcloudapi.net"
         dns_suffix => "core.usgovcloudapi.net"
         container => "insights-logs-managedidentitysigninlogs"
         registry_create_policy => "resume"
         registry_local_path => "/usr/share/logstash/pipeline/reg_managedidentitysigninlogs"
         codec => "json"
         interval => 20
         skip_learning => true
         prefix => "<TENANT_ID>"
         path_filters => ['**/*.json']
     }
     azure_blob_storage {
         connection_string => "DefaultEndpointsProtocol=https;AccountName=<BLOB_STORAGE_ACCOUNT>;AccountKey=<ACCOUNT_KEY>;EndpointSuffix=core.usgovcloudapi.net"
         dns_suffix => "core.usgovcloudapi.net"
         container => "insights-logs-noninteractiveusersigninlogs"
         registry_create_policy => "resume"
         registry_local_path => "/usr/share/logstash/pipeline/reg_noninteractiveusersigninlogs"
         codec => "json"
         interval => 20
         skip_learning => true
         prefix =>  "<TENANT_ID>"
         path_filters => ['**/*.json']
     }
     azure_blob_storage {
         connection_string => "DefaultEndpointsProtocol=https;AccountName=<BLOB_STORAGE_ACCOUNT>;AccountKey=<ACCOUNT_KEY>;EndpointSuffix=core.usgovcloudapi.net"
         dns_suffix => "core.usgovcloudapi.net"
         container => "insights-logs-serviceprincipalsigninlogs"
         registry_create_policy => "resume"
         registry_local_path => "/usr/share/logstash/pipeline/reg_serviceprincipalsigninlogs"
         codec => "json"
         interval => 20
         skip_learning => true
         prefix =>  "<TENANT_ID>"
         path_filters => ['**/*.json']
     }
     azure_blob_storage {
         connection_string => "DefaultEndpointsProtocol=https;AccountName=<BLOB_STORAGE_ACCOUNT>;AccountKey=<ACCOUNT_KEY>;EndpointSuffix=core.usgovcloudapi.net"
         dns_suffix => "core.usgovcloudapi.net"
         container => "insights-logs-signinlogs"
         registry_create_policy => "resume"
         registry_local_path => "/usr/share/logstash/pipeline/reg_signinlogs"
         codec => "json"
         interval => 20
         skip_learning => true
         prefix => "<TENANT_ID>"
         path_filters => ['**/*.json']
     }
}

filter {

    mutate {
        add_field => {"short_message" => ["Azure Signin"]}
        add_field => { "host" => "logstash-signin" }
    }
    date {
        match => ["unixtimestamp", "UNIX"]
    }
}

output {
    gelf {
        host => "<GRAYLOG IP>"
        port => 12201
        protocol => "TCP"
    }
}
janmg commented

This issue is the same as #36. This plugin had problems dealing with appendblobs because it was originally created for blockblobs. For blockblobs a list_blocks can be done, but for an appendblob the way the file grows is different. It can still be read but through offsets. I already prepared a fix for this, which requires setting a new config value "append => true"
https://github.com/janmg/logstash-input-azure_blob_storage/blob/master/lib/logstash/inputs/azure_blob_storage.rb#L453

But I haven't released the fix yet because I also want to support json_lines and that lead to conversion issues, so I'm struggling to get the byte array split into multiple events. I probably better off refactoring that logic but don't have time to focus on this plugin as my day job takes presedence.

Thanks @janmg
In AWS S3, we did not see such issue because it does not append.

We can close this then as duplicate.

janmg commented

I finally pushed 0.12.9 with support for appendblobs. In the config you can set "append => true", but if you don't, the plugin will do it by itself during the invalideBlobType exception. I haven't done extensive testing, so feedback is welcome