Explicit Java requirements
Closed this issue · 18 comments
Hi,
I've been able to update my riemann setup in production as new versions came in: 0.3.1
, 0.3.2
, 0.3.3
. All of a sudden version 0.3.4
breaks on my server. It's just a hunch, but I suspect java is too old. The java version is 1.8.0_151
. I was trying to find a reference in the documentation as to what the requirements in terms of java were, but to no avail. Have I missed it?
How does it break?
The dashboard isn't showing my regular views due to socket exceptions, and the riemann log shows that the websocket subscriptions close immediately.
INFO [2019-10-12 21:11:13,830] worker-1 - riemann.transport.websockets - New websocket subscription to index : true
INFO [2019-10-12 21:11:14,026] worker-2 - riemann.transport.websockets - Closing websocket x.x.x.x index true
INFO [2019-10-12 21:11:20,034] worker-3 - riemann.transport.websockets - New websocket subscription to index : true
INFO [2019-10-12 21:11:20,229] worker-4 - riemann.transport.websockets - Closing websocket x.x.x.x index true
INFO [2019-10-12 21:11:26,474] worker-1 - riemann.transport.websockets - New websocket subscription to index : true
INFO [2019-10-12 21:11:26,659] worker-2 - riemann.transport.websockets - Closing websocket x.x.x.x index true
I'm using:
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)
On MacOS and it runs fine for me.
Anything else weird?
Interesting... I've conducted a debugging session where I started with an empty configuration, and I have been able to determine that the problem occurs when indexing the events, but I haven't been able to find out more. In other words, the minimal configuration works (from the how-to):
(let [host "0.0.0.0"]
(tcp-server {:host host})
(udp-server {:host host})
(ws-server {:host host}))
(streams
prn)
but the following will start to display the errors mentioned previously:
(let [host "0.0.0.0"]
(tcp-server {:host host})
(udp-server {:host host})
(ws-server {:host host}))
(streams
index
prn)
```
So it works if I wrap the index.
(let [index (index)]
(streams
index
prn))
(Posted as a workaround here).
I'll keep trying to work out what the issue is... Anything you spot much appreciated.
Wait. That minimal config:
(let [host "0.0.0.0"]
(tcp-server {:host host})
(udp-server {:host host})
(ws-server {:host host}))
(streams
index
prn)
Did it work for you on a previous version??
Yes, that minimal config works on previous versions. I only experience trouble on 0.3.4
. I've played with wrapping the index too, but it's the same thing.
(let [index (index)]
; Inbound events will be passed to these streams:
(streams
...))
I appreciate your efforts, James. It's baffling to me too. The events that come in are a combination of those emitted by riemann.tools
and my own.
{:service "foo" :state "ok" :tags ["heart-beat"]}
Do you have error logs on the websocket client (maybe in the browser console) ?
Yes, the errors are coming up as soon as the indexing of events is taking place.
In the browser:
1 socket error; check the server field above.
In the console:
INFO [2019-10-14 11:58:31,292] worker-1 - riemann.transport.websockets - New websocket subscription to index : true
INFO [2019-10-14 11:58:31,481] worker-2 - riemann.transport.websockets - Closing websocket x.x.x.x index true
INFO [2019-10-14 11:58:34,918] worker-3 - riemann.transport.websockets - New websocket subscription to index : true
INFO [2019-10-14 11:58:35,120] worker-4 - riemann.transport.websockets - Closing websocket x.x.x.x index true
INFO [2019-10-14 11:58:42,114] worker-1 - riemann.transport.websockets - New websocket subscription to index : true
INFO [2019-10-14 11:58:42,290] worker-2 - riemann.transport.websockets - Closing websocket x.x.x.x index true
In the meantime, I have been able to exclude java version as the source of the problem, because I can observe the same behavior with java 11.
I can reproduce the issue with the Riemann dash, but a Python websocket client like https://github.com/mcorbin/wsriemann seems to work as expected. So the issue is maybe in the dash and not in Riemann itself.
@danielsz Okay. I don't know a huge amount about the dash code but I will take a look. What Ruby version are you running?
James, that is very kind of you.
ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-linux]
And
ruby 2.0.0p353 (2013-11-22 revision 43784) [x86_64-linux]
I am afraid I didn't get time to do more than a cursory look and I can't see anything obvious I am afraid. :(
No problem at all. I remain at version 0.3.3
for now. What we know so far:
- It's not related to the Java version
- It could be related to riemann-dash.
- At least two users have observed the problematic behavior.
I know that websockets can be finicky sometimes because of the underlying network. I've observed TCP RSTs on home networks because the ISP is using carrier-grade NAT. The weird thing here is that all is fine on versions < 0.3.4
.
I'm seeing the same issue. Interestingly when it happens, if I run https://github.com/mcorbin/wsriemann
it prints:
websockets.exceptions.InvalidMessage: Malformed HTTP message
Looking at the traffic in wireshark I see the client send the websocket upgrade request on a new connection:
Host: my_server:5556
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: Bvj5r3lPX5Oc0PK/RMhHzA==
Sec-WebSocket-Version: 13
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
User-Agent: Python/3.7 websockets/7.0
but the response has index data before the expected headers
{"host":"host2","service":"serv2","state":"ok","description":"","metric":235,"tags":null,"time":"2019-11-21T10:13:19.000Z","ttl":120.0,"public_ip":"1.2.3.4","hash":"734F88FEF4B455B3750A25CE84E23930C77C088B7229504A99D2499C1D94A0B2"}.~. {"host":"host1","service":"serv6","state":"ok","description":"","metric":67,"tags":null,"time":"2019-11-21T10:13:19.000Z","ttl":120.0,"public_ip":"3.4.5.6","hash":"6227F9E08499027A38BBC10CB2005E14FDA70AF0A398BA4B2B4E9A5B6DC6D53D"}.~. {"host":"host1","service":"serv7","state":"ok","description":"","metric":208,"tags":null,"time":"2019-11-21T10:12:51.000Z","ttl":120.0,"public_ip":"3.4.5.6","hash":"CB8B919C3EE4A35FFAE2B47FA31917719A0D4AEC7B3569E0191C98EBD905460F"}.~..{"host":"host2","service":"serv7","state":"ok","description":"","metric":232,"tags":null,"time":"2019-11-21T10:13:14.000Z","ttl":120.0,"public_ip":"1.2.3.4","hash":"9C67FC59C551C5866D5EBB1779A151AB6F9F50ABDE4E934B7079F0735CDB1F51"}.~.J{"host":"host2","service":"serv8","state":"critical","description":"","metric":91,"tags":null,"time":"2019-11-21T10:12:42.000Z","ttl":120.0,"public_ip":"1.2.3.4","hash":"984E39FC19A2B896FCD4FFBF3F55F4BC69AF8E895F69AF2E58396DAB04919976"}.~.
{"host":"host1","service":"serv2","state":"ok","description":"","metric":128,"tags":null,"time":"2019-11-21T10:13:05.000Z","ttl":120.0,"public_ip":"3.4.5.6","hash":"66FEBABBF4A80EA9521A5670308978DDADD28F960273D816F56A9860F5E1FFF7"}.~..{"host":"host1","service":"serv2","state":"ok","description":"","metric":230,"tags":null,"time":"2019-11-21T10:12:56.000Z","ttl":120.0,"public_ip":"3.4.5.6","hash":"3A0C1B5D8A45C3C6468B21A4CF90F50AF08DCCC5B58949DB757912E8AC1F9AAE"}.~.
{"host":"host1","service":"serv3","state":"ok","description":"","metric":296,"tags":null,"time":"2019-11-21T10:12:46.000Z","ttl":120.0,"public_ip":"3.4.5.6","hash":"E43BED8243CC3855846F8586FF784657155E7819305216760A13814706B1C138"}.~..{"host":"host2","service":"serv4","state":"ok","description":"","metric":29,"tags":null,"time":"2019-11-21T10:12:50.000Z","ttl":120.0,"public_ip":"1.2.3.4","hash":"B59DC56223CC1459D7A0A9E8BF5FA9499703404F0DACEF093662534D8C131A00"}.~..
{"host":"host2","service":"serv5","state":"ok","description":"","metric":86,"tags":null,"time":"2019-11-21T10
:12:59.000Z","ttl":120.0,"public_ip":"1.2.3.ec-Websocket-Accept: +zu5dVw7LbMt5ki9ovXA6CdoI6Q=
Content-Length: 0
Server: http-kit
Date: Thu, 21 Nov 2019 10:13:27 GMT```