Facing issue in reading SignIn Logs from Azure Blob Storage
neeleshg opened this issue · 3 comments
Hi
I am trying to push Azure SignIn Logs to Graylog using Logstash.
I have exported Azure SignIn Logs to Azure Blob Storage.
I have also configured azure_blob_storage input plugin in Logstash.
Logstash is running properly, however it is not sending some of the logs or missing some of the SignIn Logs, though theyu are there in Blob Storage.
When I checked logs, noticed that there are continous errors regarding partial_read.
Error:
[2023-07-07T03:50:07,756][ERROR][logstash.javapipeline ][main][39346dc9201d4b47a873dce433876729e693e1b862a5cab7a602272273e1c31c] A plugin had an unrecoverable error. Will restart this plugin.
Pipeline_id:main
Plugin: <LogStash::Inputs::AzureBlobStorage container=>"insights-logs-managedidentitysigninlogs", registry_local_path=>"/usr/share/logstash/pipeline/reg_managedidentitysigninlogs", codec=><LogStash::Codecs::JSON id=>"json_f03b1fcf-b19e-4148-8873-c1be979e0c1b", enable_metric=>true, charset=>"UTF-8">, path_filters=>["**/*.json"], prefix=>"tenantId=<TENANT_ID>", registry_create_policy=>"resume", interval=>20, skip_learning=>true, dns_suffix=>"core.usgovcloudapi.net", id=>"<ID>", connection_string=><password>, enable_metric=>true, logtype=>"raw", registry_path=>"data/registry.dat", addfilename=>false, addall=>false, debug_until=>0, debug_timer=>false, file_head=>"{\"records\":[", file_tail=>"]}">
Error: InvalidBlobType (409): The blob type is invalid for this operation.
RequestId:XXXXXXXXXXXXXXXX
Time:2023-07-07T03:50:07.7508606Z
Exception: Azure::Core::Http::HTTPError
Stack: /usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/http/http_request.rb:154:in `call'
org/jruby/RubyMethod.java:116:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/http/signer_filter.rb:28:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/http/http_request.rb:111:in `block in with_filter'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/service.rb:36:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/filtered_service.rb:34:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/signed_service.rb:41:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/storage/common/service/storage_service.rb:60:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-blob-2.0.3/lib/azure/storage/blob/blob_service.rb:179:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-blob-2.0.3/lib/azure/storage/blob/block.rb:276:in `list_blob_blocks'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-input-azure_blob_storage-0.12.7/lib/logstash/inputs/azure_blob_storage.rb:413:in `partial_read'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-input-azure_blob_storage-0.12.7/lib/logstash/inputs/azure_blob_storage.rb:271:in `block in run'
org/jruby/RubyHash.java:1519:in `each'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-input-azure_blob_storage-0.12.7/lib/logstash/inputs/azure_blob_storage.rb:246:in `run'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:414:in `inputworker'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:405:in `block in start_input'
Here is my logstash configuration:
input {
azure_blob_storage {
connection_string => "DefaultEndpointsProtocol=https;AccountName=<BLOB_STORAGE_ACCOUNT>;AccountKey=<ACCOUNT_KEY>;EndpointSuffix=core.usgovcloudapi.net"
dns_suffix => "core.usgovcloudapi.net"
container => "insights-logs-managedidentitysigninlogs"
registry_create_policy => "resume"
registry_local_path => "/usr/share/logstash/pipeline/reg_managedidentitysigninlogs"
codec => "json"
interval => 20
skip_learning => true
prefix => "<TENANT_ID>"
path_filters => ['**/*.json']
}
azure_blob_storage {
connection_string => "DefaultEndpointsProtocol=https;AccountName=<BLOB_STORAGE_ACCOUNT>;AccountKey=<ACCOUNT_KEY>;EndpointSuffix=core.usgovcloudapi.net"
dns_suffix => "core.usgovcloudapi.net"
container => "insights-logs-noninteractiveusersigninlogs"
registry_create_policy => "resume"
registry_local_path => "/usr/share/logstash/pipeline/reg_noninteractiveusersigninlogs"
codec => "json"
interval => 20
skip_learning => true
prefix => "<TENANT_ID>"
path_filters => ['**/*.json']
}
azure_blob_storage {
connection_string => "DefaultEndpointsProtocol=https;AccountName=<BLOB_STORAGE_ACCOUNT>;AccountKey=<ACCOUNT_KEY>;EndpointSuffix=core.usgovcloudapi.net"
dns_suffix => "core.usgovcloudapi.net"
container => "insights-logs-serviceprincipalsigninlogs"
registry_create_policy => "resume"
registry_local_path => "/usr/share/logstash/pipeline/reg_serviceprincipalsigninlogs"
codec => "json"
interval => 20
skip_learning => true
prefix => "<TENANT_ID>"
path_filters => ['**/*.json']
}
azure_blob_storage {
connection_string => "DefaultEndpointsProtocol=https;AccountName=<BLOB_STORAGE_ACCOUNT>;AccountKey=<ACCOUNT_KEY>;EndpointSuffix=core.usgovcloudapi.net"
dns_suffix => "core.usgovcloudapi.net"
container => "insights-logs-signinlogs"
registry_create_policy => "resume"
registry_local_path => "/usr/share/logstash/pipeline/reg_signinlogs"
codec => "json"
interval => 20
skip_learning => true
prefix => "<TENANT_ID>"
path_filters => ['**/*.json']
}
}
filter {
mutate {
add_field => {"short_message" => ["Azure Signin"]}
add_field => { "host" => "logstash-signin" }
}
date {
match => ["unixtimestamp", "UNIX"]
}
}
output {
gelf {
host => "<GRAYLOG IP>"
port => 12201
protocol => "TCP"
}
}
This issue is the same as #36. This plugin had problems dealing with appendblobs because it was originally created for blockblobs. For blockblobs a list_blocks can be done, but for an appendblob the way the file grows is different. It can still be read but through offsets. I already prepared a fix for this, which requires setting a new config value "append => true"
https://github.com/janmg/logstash-input-azure_blob_storage/blob/master/lib/logstash/inputs/azure_blob_storage.rb#L453
But I haven't released the fix yet because I also want to support json_lines and that lead to conversion issues, so I'm struggling to get the byte array split into multiple events. I probably better off refactoring that logic but don't have time to focus on this plugin as my day job takes presedence.
Thanks @janmg
In AWS S3, we did not see such issue because it does not append.
We can close this then as duplicate.
I finally pushed 0.12.9 with support for appendblobs. In the config you can set "append => true", but if you don't, the plugin will do it by itself during the invalideBlobType exception. I haven't done extensive testing, so feedback is welcome