logstash-plugins/logstash-input-couchdb_changes

plugin wont run if empty seq file present

Opened this issue · 18 comments

teebu commented

ES 1.7, LS 1.5, plugin version 1.0

Simple test: create empty seq file, set seq file in config, get error.

A plugin had an unrecoverable error. Will restart this plugin.
  Plugin: <LogStash::Inputs::CouchDBChanges db=>"mydb", host=>"asdf.cloudant.com", sequence_path=>"/home/ec2-user/.myfile_couchdb_seq", type=>"mytype", port=>443, secure=>true, username=>"asdf", password=><password>, debug=>false, codec=><LogStash::Codecs::Plain charset=>"UTF-8">, heartbeat=>1000, keep_revision=>false, ignore_attachments=>true, always_reconnect=>true, reconnect_delay=>10>
  Error: undefined method `[]' for nil:NilClass {:level=>:error}

What caused the empty file?

Doing testing, hitting ctrl+c, ends up with empty sequence file most of the time (I must be stopping it the same time it truncates the file). If I set the default starting sequence, it always starts from 0, but no errors.

♥SIGINT received. Shutting down the pipeline. {:level=>:warn}
Logstash shutdown completed

file is now empty.

this code:

      def read
        ::File.exists?(@sequence_path) ? ::File.read(@sequence_path).chomp.strip : 0
      end

Checks for seq file, but doesn't check the contents.

Possible solution:
My idea is to write to beginning of file with padding, instead of truncating. This way the file is never empty.

::File.write(filename, "seqnumber                 ",0)

Thanks for the report, and the suggestion. It think it makes more sense to check for a value before returning an empty value than to write something to the file. If you're interested in submitting a PR for this, though, I'd be happy to review the changes.

teebu commented

I tracked down the source of the problem.

If you have a bad seq number, couchdb throws the error, and the plugin does't handle it.

Error 400

{
error: "bad_request",
reason: "Malformed sequence supplied in 'since' parameter."
}

All that needs to be done is to verify that @sequence is an integer value, or use a 0. It can be done here, or by checking above here.

teebu commented

my seq look like:

403-g1AAAAOTeJy100sKwjAQBuCggt5CF65rHk2oK72JTiYpWmpbRN3qTfQmehO9SU3Tgkur0M0EhvB_MMOkhJDRpm_I1GjM93ZpNAugKMCctmgp5QGm-dFAdggye0jd7x4QPS7LMtn0Qe1cYwgKYkTZMuM3TE9c1YvGC71nlApRiG68ZeWtGk96T0VMCEn_8vg3b11558bj3otjYAp5y5jfvGzgKrm4x5HXypx5M6LWaiM6NG-1ea9M5s25sjriYYfmozafn9kaqwC47dB81abfJ63vAyWjGLeMSt5jjSbV

That shouldn't be the case, if I recall correctly. Sequence id (identified as seq) is numeric:

http://docs.couchdb.org/en/1.6.1/api/database/changes.html

The result field of database changes
JSON Object:
changes (array) – List of document`s leafs with single field rev
id (string) – Document ID
seq (number) – Update sequence number

seq should always be a number.

teebu commented

maybe I should mention I am on cloudant.

Here's a sample of _changes:

{"results":[
{"seq":"10-g1AAAAKKeJzLYWBg4MhgTmFQSUlKzi9KdUhJMtZLLChITCnLTE41MDDSS87JL01JzCvRy0styQGqZkpkSJL___9_VgZzIksuUIDdzNg8NdHChEgzSLMsSQFIJtlD7WMG25dmkGKRamxMG_scQPbFg-xLZCBRawJIaz05WvNYgCRDA5AC6p5PpvYFEO37ydR-AKL9PpnaH0C0I0VTokmqkZGJIZFGZQEAPhfSVQ","id":"335480248_version_1310478308192","changes":[{"rev":"1-88256dfa9f4e1352248c17af011e13ab"}]},
{"seq":"11-g1AAAAK_eJzLYWBg4MhgTmFQSUlKzi9KdUhJMtZLLChITCnLTE41MDDSS87JL01JzCvRy0styQGqZkpkSJL___9_VgZzIksuUIDdzNg8NdHChEgzSLMsSQFIJtlD7WMG25dmkGKRamxMG_scQPbFg-xLZCBRawJIaz05WvNYgCRDA5AC6p4P8ikj2KfJiUmGSeaWNPEpxM4FEDv3k-nkAxDt98nU_gCiHSluE01SjYxMDIk0KgsA0kvi-A","id":"387591881_price_1310391667722","changes":[{"rev":"1-0388fe0636a6f56c8e0221f14d2541d2"}]},
{"seq":"12-g1AAAAL0eJzLYWBg4MhgTmFQSUlKzi9KdUhJMtZLLChITCnLTE41MDDSS87JL01JzCvRy0styQGqZkpkSJL___9_VgZzIksuUIDdzNg8NdHChEgzSLMsSQFIJtlD7WMG25dmkGKRamxMG_scQPbFg-xLZCBRawJIaz3UqYxgpyamWZgbG5rSxKl5LECSoQFIAa2cj7AzOTHJMMnckoZ2LoDYuZ_kIIJoPwDRfp9M7Q8g2pESRKJJqpGRiSGRRmUBADKO80E","id":"417489795_version_1310478400293","changes":[{"rev":"1-0bd92a1677a7ab9ed7223b8dbe359f78"}]},

That shouldn't matter. The couchdb_changes format is defined the same. Try it yourself, and see if the API returns the expected JSON. seq should be numeric.

teebu commented

I'm not sure if this is the right way to test: but doing:

/_changes?since=9

gives me a error: "bad_request",

Hmmm. What version is cloudant using? This is not what showed up in any API research I did when writing the plugin. Did they customize CouchDB somehow?

The plugin may still work with these unusual seq numbers, but it makes testing more problematic.

It could be that in those two code locations, you have to check for a non-nil value, instead of a number. That should cover the unusual seq numbers.

teebu commented

The plugin works fine, the seq structure was not a problem.

I added a check for default value, and also a bad response reset.

Also changed the write method to write from 0, because the file was empty during truncating was causing problems.

https://github.com/teebu/logstash-input-couchdb_changes/blob/v1.0.0-def-seq/lib/logstash/inputs/couchdb_changes.rb

https://github.com/teebu/logstash-input-couchdb_changes/blob/v1.0.0-def-seq/lib/logstash/inputs/couchdb_changes.rb#L191 is not a fix I could merge as-is, though I appreciate the fix for your case. You get an error, but in the present configuration, it should still retry in 10 seconds. If you want to overwrite @sequence with a 0 here, it would be better to parse out the reason from the error response. I could merge such a change with that provision.

I couldn't accept https://github.com/teebu/logstash-input-couchdb_changes/blob/v1.0.0-def-seq/lib/logstash/inputs/couchdb_changes.rb#L147 as that's an unfortunate mask for the problem. It has a bad code smell. Better to test and account for the error properly than to employ a hackish way to try to prevent it from happening.

The fix at https://github.com/teebu/logstash-input-couchdb_changes/blob/v1.0.0-def-seq/lib/logstash/inputs/couchdb_changes.rb#L117-L119 is more tenable. But I'd put a debug or info level log above those lines that shows what was actually read. This will help establish whether it was nil, empty, or something else, then the proper check could be employed.

teebu commented

I dont know why the seq writing + (truncating) too frequently, while I ctrl+c (exiting) logstash leaves the file empty. I would guess it doesn't have a clean way to exit.

My solution goes around the truncating problem.

You're welcome to continue to use your fork/adapted code. I just can't merge it as-is.

teebu commented

I understand.

teebu commented

this issue references the question you had about seq being an integer

#26

teebu commented

I think there is also no bad authentication checking. It just crashes the plugin.

teebu commented

I didn't want to open a new ticket, but, this is related to the sequence number. In the case of cloudant, this should be a string.

  config :initial_sequence, :validate => :number