Recipe: elkstack::logstash
jnganga opened this issue · 14 comments
Hi,
I'm running into this error below when converging. Without fail, it converges on the second attempt. Is it that elasticsearch is not running yet and logstash depends on it, hence the failure? How do we resolve this?
Recipe: elkstack::elasticsearch
* service[elasticsearch] action start (up to date)
Recipe: elkstack::logstash
* logstash_service[server] action restart
* runit_service[logstash_server] action restart
================================================================================
Error executing action `restart` on resource 'runit_service[logstash_server]'
================================================================================
Mixlib::ShellOut::ShellCommandFailed
------------------------------------
Expected process to exit with [0], but received '1'
---- Begin output of /usr/bin/sv restart /etc/service/logstash_server ----
STDOUT: timeout: run: /etc/service/logstash_server: (pid 20646) 797s, got TERM
STDERR:
---- End output of /usr/bin/sv restart /etc/service/logstash_server ----
Ran /usr/bin/sv restart /etc/service/logstash_server returned 1
It is a bit hard to say from that output without some manual troubleshooting as well. From the output there, the elasticsearch service was started before, and a restart was issued to logstash_server runit service. The service started, received a PID, but then terminated itself.
After this failed run were you able to login to the instance and see the status of the elasticsearch service? It would also help to reference the runit logs under /var/log/logstash
and see what the process got there. If you can hunt down this any any other information that may help us.
Thanks for responding.
I see elasticsearch running after the failed run. But I can't tell if it was running right at the moment of failure.
$ curl 'http://localhost:9200/?pretty'
{
"status" : 200,
"name" : "default-ubuntu-1404",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "1.4.4",
"build_hash" : "c88f77ffc81301dfa9dfd81ca2232f09588bd512",
"build_timestamp" : "2015-02-19T13:05:36Z",
"build_snapshot" : false,
"lucene_version" : "4.10.3"
},
"tagline" : "You Know, for Search"
}
Also, in my log directory, this is what I have:
vagrant@default-ubuntu-1404:/var/log/logstash_server$ ll
total 8
drwxr-xr-x 2 root root 4096 Aug 9 21:29 ./
drwxrwxr-x 10 root syslog 4096 Aug 9 21:38 ../
lrwxrwxrwx 1 root root 34 Aug 9 21:29 config -> /etc/sv/logstash_server/log/config
Can you show us what settings you're applying or give us a reproducable example? I don't think we have enough information; I suspect this is an issue with configuration.
I'm actually cloning the entire repo into a new folder:
https://github.com/rackspace-cookbooks/elkstack.git
and then running 'kitchen converge' without any additional modifications.
Please see the logs below for the first and second runs. FYI, a colleague got the same issue on his machine.
Hi @jnganga -- I just cloned elkstack
and ran the same command, & it converged for me on the first and second attempts.
Please see the logs below for the first and second runs. FYI, a colleague got the same issue on his machine.
These logs contain different run lists. cic_elkstack::packer is not part of elkstack, so I think there's something else going on here (these logs aren't from the same runlist, it seems).
Could you share your Berksfile.lock
and Gemfile.lock
so I can get on the same versions you're using, and re-test?
Sorry, I attached the log files from my earlier run where I'm wrapping your cookbook. In both cases, with or without the wrapper, it errors out at the same place.
Please see the files requested below. I only had to generate this on the first run with the wrapper cookbook. I probable should have repeated with the cloned elkstack. Will do that tonight.
Yes, please let us know when you have something with elkstack itself so we can try to reproduce it. I'm specifically interested in both logs & lock files for elkstack specifically, not your wrapper. Thanks.
Sure. Please find the logs and .lock files below.
My steps:
$ git clone https://github.com/rackspace-cookbooks/elkstack.git
$ cd elkstack/
$ berks install
$ bundle install
$ kitchen list
$ kitchen create default-ubuntu-1404
$ kitchen converge default-ubuntu-1404 - see attached log - "first_run_log_cloned_elkstack"
$ kitchen converge default-ubuntu-1404 - see attached log - "second_run_log_cloned_elkstack"
Thank you.
second_run_log_cloned_elkstack.txt
first_run_log_cloned_elkstack.txt
Cloned_elkstack_Berksfile.lock.txt
Cloned_elkstack_Gemfile.lock.txt
For what its worth, even though I realize may not add any value, but I can go through the same steps and I compared the Gemfile.lock and Berksfile.lock. I was able to converge with no errors, and the only difference I found was that I had a slightly newer ohai gem locally.
In your above command I see you did not do bundle exec
so it was actually using your system/user gemset. Also to note though, you are running the latest kitchen, same as I am too, so I didn't actually see an issue there.
We'll continue to dig into this, but so far I don't see a smoking gun.
@dude051 what exact command/order should I run for bundle exec
The same order works, just prepend your kitchen commands with bundle exec
so as to run it from the bundled gems. So to answer your question directly:
$ git clone https://github.com/rackspace-cookbooks/elkstack.git
$ cd elkstack/
$ bundle install
$ bundle exec berks install
$ bundle exec kitchen list
$ bundle exec kitchen create default-ubuntu-1404
$ bundle exec kitchen converge default-ubuntu-1404
$ bundle exec kitchen converge default-ubuntu-1404
I get the same results sir! Please see attached logs.
second_run_log_bundle_elkstack.txt
first_run_log_bundle_elkstack.txt
@jnganga Do you get anything in the logstash logs when you run this? I'm wondering if the logstash service just isn't starting for some reason.
i guess this is related to my issue. here, i faced the same. with latest logstash versions