logstash-plugins/logstash-input-jdbc

logstash-input-jdbc not closing connections to MongoDB with DbSchema driver

pstrimbu opened this issue · 9 comments

Scenario: Atlas MongoDB (v4.0.5) -> AWS EC2 Logstash (v6.5.4 on Amazon Linux 2.3 w/ DbSchema driver mongojdbc1.2.jar) -> Elastic.co (v6.5.4)

I'm trying to synchronize my MongoDB data to ElasticSearch with a Logstash pipeline running Logstash with the DbSchema driver. The data push works perfectly. On the first run, it grabs everything, then writes the .lastrun file and subsequent runs at 60 second intervals only pick up changes. Everything is perfect, except logstash opens 4 connections each minute and keeps them open (in ESTABLISHED state), never releases them and ultimately uses 100% of the database's 350 available connections, effectively shutting it down.

I am connecting my Logstash EC2 to Atlas MongoDB via a VPC Peering connection. I can't imagine this would impact the socket release, but I read something that a proxy could somehow prevent the connection release. Not sure about this.

Here is my pipeline config:

input {
  jdbc {
    jdbc_driver_library => "/usr/share/logstash/DbSchema/drivers/MongoDb/mongojdbc1.2.jar"
    jdbc_driver_class => "com.dbschema.MongoJdbcDriver"
    jdbc_connection_string => "jdbc:mongodb://<user>:<password>@<mongo-shard1>:27017,<mongo-shard2>:27017,<mongo-shard3>:27017/<db-name>?ssl=true&replicaSet=<replica-set-name>&authSource=admin"
    jdbc_user => ""
    jdbc_password => ""
    sql_log_level => "fatal"
    # sql_log_level => "debug"

    last_run_metadata_path => "/usr/share/logstash/.logstash.<table>.lastrun"
    schedule => "0 * * * * *"
    statement => "
    var last = :sql_last_value;
    last = last.replace(' ','T');
    last = last.substr(0,22) + 'Z'; # convert :sql_last_value to Mongo ISODate
        db.media.find({'modified':{'$gte': ISODate(last)}},{'shortid':1,'title':1,'description':1}
        )
        .map(function(doc) {
            return {
                'id':doc._id.toString(),  # because DbSchema doesn't support Mongo ObjectId
                'shortid':doc.shortid,
                'title':doc.title,
                'description':doc.description
            }
        })
    "
  }
}

output {
    # stdout { codec => rubydebug }
    elasticsearch {
        hosts => ["https://<elastic.co-host>.us-east-1.aws.found.io:9243"]
        user => "<elastic-username>"
        password => "<elastic-password>"
        index => "<index-name>"
        doc_as_upsert => true
        document_id => "%{[map][shortid]}"
    }
}

Here is the graph of connections in Atlas. You can see the incremental growth, then drops off when I hard kill the logstash process.
image

Here is what the logstash netstat -p shows:

[ec2-user@ip-172-17-48-41 ~]$ sudo netstat -p | grep 27017
tcp6       0      0 ip-172-17-48-41.e:33210 ip-192-168-254-51:27017 ESTABLISHED 8391/java
tcp6       0      0 ip-172-17-48-41.e:33966 ip-192-168-253-18:27017 ESTABLISHED 8391/java
tcp6       0      0 ip-172-17-48-41.e:59148 ip-192-168-248-21:27017 ESTABLISHED 8391/java
tcp6       0      0 ip-172-17-48-41.e:59156 ip-192-168-248-21:27017 ESTABLISHED 8391/java
tcp6       0      0 ip-172-17-48-41.e:33216 ip-192-168-254-51:27017 ESTABLISHED 8391/java
tcp6       0      0 ip-172-17-48-41.e:59140 ip-192-168-248-21:27017 ESTABLISHED 8391/java
tcp6       0      0 ip-172-17-48-41.e:33954 ip-192-168-253-18:27017 ESTABLISHED 8391/java
tcp6       0      0 ip-172-17-48-41.e:59152 ip-192-168-248-21:27017 ESTABLISHED 8391/java
[ec2-user@ip-172-17-48-41 ~]$ sudo netstat -p | grep 27017 | wc
     12      84    1212
[ec2-user@ip-172-17-48-41 ~]$ sudo netstat -p | grep 27017 | wc
     12      84    1212
[ec2-user@ip-172-17-48-41 ~]$ sudo netstat -p | grep 27017 | wc
     12      84    1212
[ec2-user@ip-172-17-48-41 ~]$ sudo netstat -p | grep 27017 | wc
     16     112    1616
[ec2-user@ip-172-17-48-41 ~]$ sudo netstat -p | grep 27017 | wc
     36     252    3636

You can see the sockets stay in "ESTABLISHED" state and continue to grow.

Ideas?

Thanks for the assistance.

We're experiencing very the same dificulties.
Our goal is to get some specific data from mongoDb to ElasticSearch with a Logstash pipeline and the DbSchema driver.
From functional stand point it works just fine, except logstash opens a connection each time and keeps it open (in ESTABLISHED state), never releases it.

Our setup:
mongo (v3.6.11) -> logstash (v6.6.1 + DbSchema driver mongojdbc1.2.jar) -> elasticsearch (v6.6.1) (all on docker)

Here is my pipeline config

input {
    jdbc {
            jdbc_driver_library => "/usr/share/logstash/pipeline/driver/mongojdbc1.2.jar"
            jdbc_driver_class => "com.dbschema.MongoJdbcDriver"
            jdbc_connection_string => "jdbc:mongodb://mongodb:27017/OBA?maxPoolSize=5"
            jdbc_validate_connection => true
			jdbc_user => ""
			clean_run => false
			record_last_run => true
            last_run_metadata_path => ".logstash_jdbc_last_run"
            schedule => "*/10 * * * * *"
            statement => "
				
				var lastValue = :sql_last_value;

				var extractedDate = lastValue.substring(0,10); 
				var year = Number(extractedDate.substring(0,4));
				var month = Number(extractedDate.substring(5,7));
				var day = Number(extractedDate.substring(8,10));

				var extractedTime = lastValue.substring(11,23); 
				var hour = Number(extractedTime.substring(0,2));
				var minute = Number(extractedTime.substring(3,5));
				var seconds = Number(extractedTime.substring(6,8));
				var miliseconds = Number(extractedTime.substring(9,12));

				var upperEpoch = new Date(year,month-1,day,hour,minute,seconds,miliseconds).getTime();
				
				var TEN_SEC_OFFSET = 10000;
				
				var bottomEpoch = upperEpoch - TEN_SEC_OFFSET;				
				
				db.sam.find({ time_stamp: { $gte : bottomEpoch, $lte: upperEpoch}},{'_id': false});
			"
    }

}
output {
   elasticsearch {
        hosts => [ "elasticsearch:9200" ]
		index => "oba"
		document_type => "audits"
    }
    stdout { codec => rubydebug }
}

Any ideas?
Maybe there is some workaround ?

I have same issue with Logstash 7.2 and mongo driver . any update?thanks!

Scenario: Atlas MongoDB (v4.0.5) -> AWS EC2 Logstash (v6.5.4 on Amazon Linux 2.3 w/ DbSchema driver mongojdbc1.2.jar) -> Elastic.co (v6.5.4)

I'm trying to synchronize my MongoDB data to ElasticSearch with a Logstash pipeline running Logstash with the DbSchema driver. The data push works perfectly. On the first run, it grabs everything, then writes the .lastrun file and subsequent runs at 60 second intervals only pick up changes. Everything is perfect, except logstash opens 4 connections each minute and keeps them open (in ESTABLISHED state), never releases them and ultimately uses 100% of the database's 350 available connections, effectively shutting it down.

I am connecting my Logstash EC2 to Atlas MongoDB via a VPC Peering connection. I can't imagine this would impact the socket release, but I read something that a proxy could somehow prevent the connection release. Not sure about this.

Here is my pipeline config:

input {
  jdbc {
    jdbc_driver_library => "/usr/share/logstash/DbSchema/drivers/MongoDb/mongojdbc1.2.jar"
    jdbc_driver_class => "com.dbschema.MongoJdbcDriver"
    jdbc_connection_string => "jdbc:mongodb://<user>:<password>@<mongo-shard1>:27017,<mongo-shard2>:27017,<mongo-shard3>:27017/<db-name>?ssl=true&replicaSet=<replica-set-name>&authSource=admin"
    jdbc_user => ""
    jdbc_password => ""
    sql_log_level => "fatal"
    # sql_log_level => "debug"

    last_run_metadata_path => "/usr/share/logstash/.logstash.<table>.lastrun"
    schedule => "0 * * * * *"
    statement => "
    var last = :sql_last_value;
    last = last.replace(' ','T');
    last = last.substr(0,22) + 'Z'; # convert :sql_last_value to Mongo ISODate
        db.media.find({'modified':{'$gte': ISODate(last)}},{'shortid':1,'title':1,'description':1}
        )
        .map(function(doc) {
            return {
                'id':doc._id.toString(),  # because DbSchema doesn't support Mongo ObjectId
                'shortid':doc.shortid,
                'title':doc.title,
                'description':doc.description
            }
        })
    "
  }
}

output {
    # stdout { codec => rubydebug }
    elasticsearch {
        hosts => ["https://<elastic.co-host>.us-east-1.aws.found.io:9243"]
        user => "<elastic-username>"
        password => "<elastic-password>"
        index => "<index-name>"
        doc_as_upsert => true
        document_id => "%{[map][shortid]}"
    }
}

Here is the graph of connections in Atlas. You can see the incremental growth, then drops off when I hard kill the logstash process.
image

Here is what the logstash netstat -p shows:

[ec2-user@ip-172-17-48-41 ~]$ sudo netstat -p | grep 27017
tcp6       0      0 ip-172-17-48-41.e:33210 ip-192-168-254-51:27017 ESTABLISHED 8391/java
tcp6       0      0 ip-172-17-48-41.e:33966 ip-192-168-253-18:27017 ESTABLISHED 8391/java
tcp6       0      0 ip-172-17-48-41.e:59148 ip-192-168-248-21:27017 ESTABLISHED 8391/java
tcp6       0      0 ip-172-17-48-41.e:59156 ip-192-168-248-21:27017 ESTABLISHED 8391/java
tcp6       0      0 ip-172-17-48-41.e:33216 ip-192-168-254-51:27017 ESTABLISHED 8391/java
tcp6       0      0 ip-172-17-48-41.e:59140 ip-192-168-248-21:27017 ESTABLISHED 8391/java
tcp6       0      0 ip-172-17-48-41.e:33954 ip-192-168-253-18:27017 ESTABLISHED 8391/java
tcp6       0      0 ip-172-17-48-41.e:59152 ip-192-168-248-21:27017 ESTABLISHED 8391/java
[ec2-user@ip-172-17-48-41 ~]$ sudo netstat -p | grep 27017 | wc
     12      84    1212
[ec2-user@ip-172-17-48-41 ~]$ sudo netstat -p | grep 27017 | wc
     12      84    1212
[ec2-user@ip-172-17-48-41 ~]$ sudo netstat -p | grep 27017 | wc
     12      84    1212
[ec2-user@ip-172-17-48-41 ~]$ sudo netstat -p | grep 27017 | wc
     16     112    1616
[ec2-user@ip-172-17-48-41 ~]$ sudo netstat -p | grep 27017 | wc
     36     252    3636

You can see the sockets stay in "ESTABLISHED" state and continue to grow.

Ideas?

Thanks for the assistance.

I have same issue with Logstash 7.2 and mongo driver . any update?thanks!

I had no success resolving it. I ended up just scheduling a cron job to kill the process every 20 minutes. It's not a great solution, but its working for me.

I had no success resolving it. I ended up just scheduling a cron job to kill the process every 20 minutes. It's not a great solution, but its working for me.

Thanks for your reply. I had to download the mongo jdbc driver source code and found this issue is caused by close function. The implementation did not call Mongo client close function . After change it , the close work well

scne commented

@lonelywolf4git please share your implementation or code. Thanks

I had no success resolving it. I ended up just scheduling a cron job to kill the process every 20 minutes. It's not a great solution, but its working for me.

Thanks for your reply. I had to download the mongo jdbc driver source code and found this issue is caused by close function. The implementation did not call Mongo client close function . After change it , the close work well

I have same issue with logstash 7.7.0 ,7.8.0 and I have used mongo-java-driver-3.6.4 , 3.12.0, 3.12.4 , 3.12.5 Seems nothing worked for me. Number of connections keeps on growing.

@lonelywolf4git Can you please share the code change along with file names and path (or) jar files that you have used that will be helpful.

The problem is on the dbschema driver and not on logstash. I opened an issue on the project with my changes : https://bitbucket.org/dbschema/mongodb-jdbc-driver/issues/6/mongoclient-never-closed

It's not logstash-input-jdbc issue.. The issue is in jdbc driver which you have used in logstash-input-jdbc plugin. The latest official code repo of jdbc driver is here. It's need to close the connection in the code for which I have made changes in driver..
I created new build and tested and it's working fine for me!