riemann/riemann

Explicit Java requirements

Closed this issue · 18 comments

Hi,

I've been able to update my riemann setup in production as new versions came in: 0.3.1, 0.3.2, 0.3.3. All of a sudden version 0.3.4 breaks on my server. It's just a hunch, but I suspect java is too old. The java version is 1.8.0_151. I was trying to find a reference in the documentation as to what the requirements in terms of java were, but to no avail. Have I missed it?

How does it break?

The dashboard isn't showing my regular views due to socket exceptions, and the riemann log shows that the websocket subscriptions close immediately.

INFO [2019-10-12 21:11:13,830] worker-1 - riemann.transport.websockets - New websocket subscription to index : true
INFO [2019-10-12 21:11:14,026] worker-2 - riemann.transport.websockets - Closing websocket  x.x.x.x index true
INFO [2019-10-12 21:11:20,034] worker-3 - riemann.transport.websockets - New websocket subscription to index : true
INFO [2019-10-12 21:11:20,229] worker-4 - riemann.transport.websockets - Closing websocket  x.x.x.x index true
INFO [2019-10-12 21:11:26,474] worker-1 - riemann.transport.websockets - New websocket subscription to index : true
INFO [2019-10-12 21:11:26,659] worker-2 - riemann.transport.websockets - Closing websocket  x.x.x.x index true

I'm using:

java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

On MacOS and it runs fine for me.

Anything else weird?

Interesting... I've conducted a debugging session where I started with an empty configuration, and I have been able to determine that the problem occurs when indexing the events, but I haven't been able to find out more. In other words, the minimal configuration works (from the how-to):

(let [host "0.0.0.0"]
  (tcp-server {:host host})
  (udp-server {:host host})
  (ws-server  {:host host}))

(streams
 prn)

but the following will start to display the errors mentioned previously:

(let [host "0.0.0.0"]
  (tcp-server {:host host})
  (udp-server {:host host})
  (ws-server  {:host host}))

(streams
 index
 prn)
```

So it works if I wrap the index.

(let [index (index)]
  (streams
    index
    prn))

(Posted as a workaround here).

I'll keep trying to work out what the issue is... Anything you spot much appreciated.

Wait. That minimal config:

(let [host "0.0.0.0"]
  (tcp-server {:host host})
  (udp-server {:host host})
  (ws-server  {:host host}))

(streams
 index
 prn)

Did it work for you on a previous version??

Yes, that minimal config works on previous versions. I only experience trouble on 0.3.4. I've played with wrapping the index too, but it's the same thing.

(let [index (index)]
  ; Inbound events will be passed to these streams:
  (streams
    ...))

I appreciate your efforts, James. It's baffling to me too. The events that come in are a combination of those emitted by riemann.tools and my own.

{:service "foo" :state "ok" :tags ["heart-beat"]}

Do you have error logs on the websocket client (maybe in the browser console) ?

Yes, the errors are coming up as soon as the indexing of events is taking place.
In the browser:

1 socket error; check the server field above.

In the console:

INFO [2019-10-14 11:58:31,292] worker-1 - riemann.transport.websockets - New websocket subscription to index : true
INFO [2019-10-14 11:58:31,481] worker-2 - riemann.transport.websockets - Closing websocket  x.x.x.x index true
INFO [2019-10-14 11:58:34,918] worker-3 - riemann.transport.websockets - New websocket subscription to index : true
INFO [2019-10-14 11:58:35,120] worker-4 - riemann.transport.websockets - Closing websocket  x.x.x.x index true
INFO [2019-10-14 11:58:42,114] worker-1 - riemann.transport.websockets - New websocket subscription to index : true
INFO [2019-10-14 11:58:42,290] worker-2 - riemann.transport.websockets - Closing websocket  x.x.x.x index true

In the meantime, I have been able to exclude java version as the source of the problem, because I can observe the same behavior with java 11.

I can reproduce the issue with the Riemann dash, but a Python websocket client like https://github.com/mcorbin/wsriemann seems to work as expected. So the issue is maybe in the dash and not in Riemann itself.

@danielsz What version of the dash are you using?

@mcorbin That makes sense, actually. Thank you.

@jamtur01 We're on the latest version, 0.2.14

@danielsz Okay. I don't know a huge amount about the dash code but I will take a look. What Ruby version are you running?

James, that is very kind of you.

ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-linux]

And

ruby 2.0.0p353 (2013-11-22 revision 43784) [x86_64-linux]

I am afraid I didn't get time to do more than a cursory look and I can't see anything obvious I am afraid. :(

No problem at all. I remain at version 0.3.3 for now. What we know so far:

  • It's not related to the Java version
  • It could be related to riemann-dash.
  • At least two users have observed the problematic behavior.

I know that websockets can be finicky sometimes because of the underlying network. I've observed TCP RSTs on home networks because the ISP is using carrier-grade NAT. The weird thing here is that all is fine on versions < 0.3.4.

I'm seeing the same issue. Interestingly when it happens, if I run https://github.com/mcorbin/wsriemann
it prints:

websockets.exceptions.InvalidMessage: Malformed HTTP message

Looking at the traffic in wireshark I see the client send the websocket upgrade request on a new connection:

Host: my_server:5556
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: Bvj5r3lPX5Oc0PK/RMhHzA==
Sec-WebSocket-Version: 13
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
User-Agent: Python/3.7 websockets/7.0

but the response has index data before the expected headers

{"host":"host2","service":"serv2","state":"ok","description":"","metric":235,"tags":null,"time":"2019-11-21T10:13:19.000Z","ttl":120.0,"public_ip":"1.2.3.4","hash":"734F88FEF4B455B3750A25CE84E23930C77C088B7229504A99D2499C1D94A0B2"}.~.	{"host":"host1","service":"serv6","state":"ok","description":"","metric":67,"tags":null,"time":"2019-11-21T10:13:19.000Z","ttl":120.0,"public_ip":"3.4.5.6","hash":"6227F9E08499027A38BBC10CB2005E14FDA70AF0A398BA4B2B4E9A5B6DC6D53D"}.~.	{"host":"host1","service":"serv7","state":"ok","description":"","metric":208,"tags":null,"time":"2019-11-21T10:12:51.000Z","ttl":120.0,"public_ip":"3.4.5.6","hash":"CB8B919C3EE4A35FFAE2B47FA31917719A0D4AEC7B3569E0191C98EBD905460F"}.~..{"host":"host2","service":"serv7","state":"ok","description":"","metric":232,"tags":null,"time":"2019-11-21T10:13:14.000Z","ttl":120.0,"public_ip":"1.2.3.4","hash":"9C67FC59C551C5866D5EBB1779A151AB6F9F50ABDE4E934B7079F0735CDB1F51"}.~.J{"host":"host2","service":"serv8","state":"critical","description":"","metric":91,"tags":null,"time":"2019-11-21T10:12:42.000Z","ttl":120.0,"public_ip":"1.2.3.4","hash":"984E39FC19A2B896FCD4FFBF3F55F4BC69AF8E895F69AF2E58396DAB04919976"}.~.
{"host":"host1","service":"serv2","state":"ok","description":"","metric":128,"tags":null,"time":"2019-11-21T10:13:05.000Z","ttl":120.0,"public_ip":"3.4.5.6","hash":"66FEBABBF4A80EA9521A5670308978DDADD28F960273D816F56A9860F5E1FFF7"}.~..{"host":"host1","service":"serv2","state":"ok","description":"","metric":230,"tags":null,"time":"2019-11-21T10:12:56.000Z","ttl":120.0,"public_ip":"3.4.5.6","hash":"3A0C1B5D8A45C3C6468B21A4CF90F50AF08DCCC5B58949DB757912E8AC1F9AAE"}.~.
{"host":"host1","service":"serv3","state":"ok","description":"","metric":296,"tags":null,"time":"2019-11-21T10:12:46.000Z","ttl":120.0,"public_ip":"3.4.5.6","hash":"E43BED8243CC3855846F8586FF784657155E7819305216760A13814706B1C138"}.~..{"host":"host2","service":"serv4","state":"ok","description":"","metric":29,"tags":null,"time":"2019-11-21T10:12:50.000Z","ttl":120.0,"public_ip":"1.2.3.4","hash":"B59DC56223CC1459D7A0A9E8BF5FA9499703404F0DACEF093662534D8C131A00"}.~..
{"host":"host2","service":"serv5","state":"ok","description":"","metric":86,"tags":null,"time":"2019-11-21T10
:12:59.000Z","ttl":120.0,"public_ip":"1.2.3.ec-Websocket-Accept: +zu5dVw7LbMt5ki9ovXA6CdoI6Q=
Content-Length: 0
Server: http-kit
Date: Thu, 21 Nov 2019 10:13:27 GMT```

Fixed in #960.