janmg/logstash-input-azure_blob_storage

Error: BlobArchived (409): This operation is not permitted on an archived blob.

laurentiubanica opened this issue · 10 comments

Plugin: <LogStash::Inputs::AzureBlobStorage container=>"insights-logs-networksecuritygroupflowevent", logtype=>"nsgflowlog", interval=>60, id=>"f24d31350d3f11c4bf5b63755eee399615ae21f22556ac6850fd3f4e23676b83", connection_string=>, prefix=>"resourceId=/", enable_metric=>true, codec=><LogStash::Codecs::JSON id=>"json_fa68ed3e-bcd3-49fe-8216-b6b515bf96a9", enable_metric=>true, charset=>"UTF-8">, dns_suffix=>"core.windows.net", registry_path=>"data/registry.dat", registry_create_policy=>"resume", file_head=>"{"records":[", file_tail=>"]}">
Error: BlobArchived (409): This operation is not permitted on an archived blob.
RequestId:947b3fff-201e-006c-60f7-4a780a000000
Time:2021-05-17T08:34:40.5724187Z
Exception: Azure::Core::Http::HTTPError
Stack: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/http/http_request.rb:153:in call' org/jruby/RubyMethod.java:116:in call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/http/signer_filter.rb:28:in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/http/http_request.rb:110:in block in with_filter'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/service.rb:36:in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/filtered_service.rb:34:in call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/signed_service.rb:41:in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-storage-common-1.1.0/lib/azure/storage/common/service/storage_service.rb:60:in call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-storage-blob-1.1.0/lib/azure/storage/blob/blob_service.rb:179:in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-storage-blob-1.1.0/lib/azure/storage/blob/blob.rb:106:in get_blob'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-azure_blob_storage-0.11.1/lib/logstash/inputs/azure_blob_storage.rb:256:in full_read' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-azure_blob_storage-0.11.1/lib/logstash/inputs/azure_blob_storage.rb:196:in block in run'
org/jruby/RubyHash.java:1417:in each' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-azure_blob_storage-0.11.1/lib/logstash/inputs/azure_blob_storage.rb:191:in run'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:321:in inputworker' /usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:313:in block in start_input'

janmg commented

I can catch the error and then ignore it, so that processing can continue. I have never considered archive blobs. These are not accessible instantly, so it fails. According to the Azure documentation "Data in the archive tier can take several hours to retrieve".
catching the archive blob error can be included in a new version 0.11.7.

Hi Jan,
However, the pipeline doesn't start:

How can we force it to start ?

Also, we noticed that, for the past weeks, only few blocks from containers were processed. Each NSG has a folder associated in the container. However, Logstash reads in only the first two directories.

[2021-05-17T11:22:04,444][ERROR][logstash.javapipeline ][nsg-cut] Pipeline aborted due to error {:pipeline_id=>"nsg-cut", :exception=>#<Azure::Core::Http::HTTPError:2060 @status_code: 409, @http_response: #<Azure::Core::Http::HttpResponse:0x4fbd6692 @http_response=#<Faraday::Response:0x152a4420 @on_complete_callbacks=[], @env=#<Faraday::Env @method=:get @Body="\xEF\xBB\xBFBlobArchivedThis operation is not permitted on an archived blob.\nRequestId:2905562f-701e-007f-060e-4b4deb000000\nTime:2021-05-17T11:22:03.3906341Z" @url=#<URI::HTTPS https://...> @request=#<Faraday::RequestOptions open_timeout=60> @request_headers={"User-Agent"=>"Azure-Storage/1.1.0-1.1.0 (Ruby 2.5.3-p0; Linux linux)", "x-ms-date"=>"Mon, 17 May 2021 11:22:02 GMT", "x-ms-version"=>"2017-11-09", "DataServiceVersion"=>"1.0;NetFx", "MaxDataServiceVersion"=>"3.0;NetFx", "Content-Type"=>"application/atom+xml; charset=utf-8", "Content-Length"=>"0", "Authorization"=>"SharedKey nsgflowstorage:..."} @ssl=#<Faraday::SSLOptions verify=true> @response=#<Faraday::Response:0x152a4420 ...> @response_headers={"content-length"=>"233", "content-type"=>"application/xml", "server"=>"Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0", "x-ms-request-id"=>"2905562f-701e-007f-060e-4b4deb000000", "x-ms-version"=>"2017-11-09", "x-ms-error-code"=>"BlobArchived", "date"=>"Mon, 17 May 2021 11:22:03 GMT", "connection"=>"close"} @status=409 @reason_phrase="This operation is not permitted on an archived blob.">>, @uri=#<URI::HTTPS https://...>, @description: "This operation is not permitted on an archived blob.\nRequestId:2905562f-701e-007f-060e-4b4deb000000\nTime:2021-05-17T11:22:03.3906341Z", @type: "BlobArchived">, :backtrace=>["/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/http/http_request.rb:153:in call'", "org/jruby/RubyMethod.java:116:in call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/http/signer_filter.rb:28:in call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/http/http_request.rb:110:in block in with_filter'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/service.rb:36:in call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/filtered_service.rb:34:in call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/signed_service.rb:41:in call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-storage-common-1.1.0/lib/azure/storage/common/service/storage_service.rb:60:in call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-storage-blob-1.1.0/lib/azure/storage/blob/blob_service.rb:179:in call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-storage-blob-1.1.0/lib/azure/storage/blob/block.rb:276:in list_blob_blocks'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-azure_blob_storage-0.11.1/lib/logstash/inputs/azure_blob_storage.rb:378:in learn_encapsulation'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-azure_blob_storage-0.11.1/lib/logstash/inputs/azure_blob_storage.rb:149:in register'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:199:in block in register_plugins'", "org/jruby/RubyArray.java:1800:in each'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:198:in register_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:303:in start_inputs'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:259:in start_workers'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:153:in run'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:108:in `block in start'"], "pipeline.sources"=>["/etc/logstash/conf.d/nsg-cut.conf"], :thread=>"#<Thread:0x10a51dfa run>"}
[2021-05-17T11:22:04,456][ERROR][logstash.agent ] Failed to execute action {:id=>:"nsg-cut", :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Create, action_result: false", :backtrace=>nil}

janmg commented

I never knew about BlobArchived. So I'll have to release a new version to mark these are unreadable and continue.

This happens during the start, the first thing it tries to do is learn the JSON head and tail. For NSG these are default and I thought I added a skip_learning variable, but apparently not yet. The learning takes a random json file and takes the first and last block to learn the JSON tags.

If this would have happened during the normal processing, it would maybe be possible to configure the parameter "prefix" if the files are in folder where the blobs are not archived or with the parameter "path_filters" which is an array that by default is ['**/*'], but if your path or files contain identifiers that are not included in the archiveblobs.

If you can't identify the archive blobs by their filename, you'll have to wait until I update the plugin to catch the exception thrown by the archiveblob

Ok. I'll try to identify if prefix is different for the archived ones vs. hot ones. If so, I'll add the parameter in the input configuration.

I also tried with registry_create_policy => "start_fresh" to force logstash to create a new registry.dat (deleted the old one, but kept the download). This way, I thought it would avoid the archived blobs. However, it doesn't create the file.

We can close this issue. It's ok now. Thank you !

janmg commented

archive blobs exceptions are now caught and the files are logged and ignored