getting this on one of the nodes now
Freaksed opened this issue · 17 comments
Starting clustercode ...
Starting clustercode ... done
Attaching to clustercode
clustercode | Checking if /usr/src/clustercode/config has contents...
clustercode | Checking if /profiles has contents...
clustercode | Invoking java -jar clustercode.jar
clustercode | 2017-09-01 06:09:32.478 INFO - Working dir: /usr/src/clustercode
clustercode | 2017-09-01 06:09:32.489 INFO - Reading configuration file /usr/src/clustercode/config/clustercode.properties...
clustercode | 2017-09-01 06:09:32.518 INFO - Booting clustercode 1.1.0...
clustercode | Sep 01, 2017 6:09:33 AM com.google.inject.internal.MessageProcessor visit
clustercode | INFO: An exception was caught and reported. Message: java.lang.IllegalArgumentException: No enum constant net.chrigel.clustercode.transcode.impl.Transcoders.
clustercode | java.lang.IllegalArgumentException: No enum constant net.chrigel.clustercode.transcode.impl.Transcoders.
clustercode | at java.lang.Enum.valueOf(Enum.java:238)
clustercode | at net.chrigel.clustercode.transcode.impl.Transcoders.valueOf(Transcoders.java:7)
clustercode | at net.chrigel.clustercode.transcode.impl.TranscodeModule.configure(TranscodeModule.java:37)
clustercode | at com.google.inject.AbstractModule.configure(AbstractModule.java:62)
clustercode | at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:340)
clustercode | at com.google.inject.spi.Elements.getElements(Elements.java:110)
clustercode | at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:138)
clustercode | at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:104)
clustercode | at com.google.inject.Guice.createInjector(Guice.java:99)
clustercode | at com.google.inject.Guice.createInjector(Guice.java:73)
clustercode | at net.chrigel.clustercode.Startup.main(Startup.java:80)
clustercode |
clustercode | 2017-09-01 06:09:33.814 ERROR - Application-wide uncaught exception:
clustercode | com.google.inject.CreationException: Unable to create injector, see the following errors:
clustercode |
clustercode | 1) No implementation for java.lang.String annotated with @com.google.inject.name.Named(value=CC_TRANSCODE_TYPE) was bound.
clustercode | while locating java.lang.String annotated with @com.google.inject.name.Named(value=CC_TRANSCODE_TYPE)
clustercode | for the 5th parameter of net.chrigel.clustercode.transcode.impl.TranscoderSettingsImpl.(TranscoderSettingsImpl.java:26)
clustercode | at net.chrigel.clustercode.transcode.impl.TranscodeModule.configure(TranscodeModule.java:34)
clustercode |
clustercode | 2) An exception was caught and reported. Message: No enum constant net.chrigel.clustercode.transcode.impl.Transcoders.
clustercode | at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:138)
clustercode |
clustercode | 3) No implementation for net.chrigel.clustercode.transcode.impl.ProgressCalculator was bound.
clustercode | at net.chrigel.clustercode.transcode.impl.TranscodeModule.configure(TranscodeModule.java:37)
clustercode |
clustercode | 4) Property CC_TRANSCODE_TYPE is not set.
clustercode | at net.chrigel.clustercode.util.di.AbstractPropertiesModule.getEnvironmentVariableOrProperty(AbstractPropertiesModule.java:136)
clustercode |
clustercode | 5) Property CC_REST_API_ENABLED is not set.
clustercode | at net.chrigel.clustercode.util.di.AbstractPropertiesModule.getEnvironmentVariableOrProperty(AbstractPropertiesModule.java:136)
clustercode |
clustercode | 6) Property CC_REST_API_PORT is not set.
clustercode | at net.chrigel.clustercode.util.di.AbstractPropertiesModule.getEnvironmentVariableOrProperty(AbstractPropertiesModule.java:136)
clustercode |
clustercode | 6 errors
clustercode | at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:470) ~[clustercode.jar:1.1.0]
clustercode | at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:155) ~[clustercode.jar:1.1.0]
clustercode | at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:107) ~[clustercode.jar:1.1.0]
clustercode | at com.google.inject.Guice.createInjector(Guice.java:99) ~[clustercode.jar:1.1.0]
clustercode | at com.google.inject.Guice.createInjector(Guice.java:73) ~[clustercode.jar:1.1.0]
clustercode | at net.chrigel.clustercode.Startup.main(Startup.java:80) ~[clustercode.jar:1.1.0]
clustercode | Caused by: java.lang.IllegalArgumentException: No enum constant net.chrigel.clustercode.transcode.impl.Transcoders.
clustercode | at java.lang.Enum.valueOf(Enum.java:238) ~[?:1.8.0_131]
clustercode | at net.chrigel.clustercode.transcode.impl.Transcoders.valueOf(Transcoders.java:7) ~[clustercode.jar:1.1.0]
clustercode | at net.chrigel.clustercode.transcode.impl.TranscodeModule.configure(TranscodeModule.java:37) ~[clustercode.jar:1.1.0]
clustercode | at com.google.inject.AbstractModule.configure(AbstractModule.java:62) ~[clustercode.jar:1.1.0]
clustercode | at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:340) ~[clustercode.jar:1.1.0]
clustercode | at com.google.inject.spi.Elements.getElements(Elements.java:110) ~[clustercode.jar:1.1.0]
clustercode | at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:138) ~[clustercode.jar:1.1.0]
clustercode | at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:104) ~[clustercode.jar:1.1.0]
clustercode | ... 3 more
clustercode exited with code 1
In the Release Notes it is described that there are new settings which break old installations. Please add them in clustercode.properties
or remove the config mount point to get the defaults.
got a new one...
clustercode | 2017-09-02 04:09:12.536 WARN - catching
clustercode | org.jgroups.StateTransferException: state transfer failed
clustercode | at org.jgroups.JChannel.getState(JChannel.java:947) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.JChannel.getState(JChannel.java:579) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.JChannel.getState(JChannel.java:572) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.blocks.ReplicatedHashMap.start(ReplicatedHashMap.java:165) ~[clustercode.jar:1.1.0]
clustercode | at net.chrigel.clustercode.cluster.impl.JgroupsClusterImpl.joinCluster(JgroupsClusterImpl.java:73) [clustercode.jar:1.1.0]
clustercode | at net.chrigel.clustercode.statemachine.actions.InitializeAction.doExecute(InitializeAction.java:26) [clustercode.jar:1.1.0]
clustercode | at net.chrigel.clustercode.statemachine.Action.execute(Action.java:23) [clustercode.jar:1.1.0]
clustercode | at net.chrigel.clustercode.statemachine.Action.execute(Action.java:13) [clustercode.jar:1.1.0]
clustercode | at org.squirrelframework.foundation.fsm.impl.AbstractExecutionService$ActionContext.run(AbstractExecutionService.java:307) [clustercode.jar:1.1.0]
clustercode | at org.squirrelframework.foundation.fsm.impl.AbstractExecutionService.doExecute(AbstractExecutionService.java:81) [clustercode.jar:1.1.0]
clustercode | at org.squirrelframework.foundation.fsm.impl.AbstractExecutionService.executeActions(AbstractExecutionService.java:132) [clustercode.jar:1.1.0]
clustercode | at org.squirrelframework.foundation.fsm.impl.AbstractExecutionService.execute(AbstractExecutionService.java:140) [clustercode.jar:1.1.0]
clustercode | at org.squirrelframework.foundation.fsm.impl.AbstractStateMachine.internalStart(AbstractStateMachine.java:552) [clustercode.jar:1.1.0]
clustercode | at org.squirrelframework.foundation.fsm.impl.AbstractStateMachine.start(AbstractStateMachine.java:539) [clustercode.jar:1.1.0]
clustercode | at net.chrigel.clustercode.statemachine.states.StateController.initialize(StateController.java:61) [clustercode.jar:1.1.0]
clustercode | at net.chrigel.clustercode.Startup.main(Startup.java:83) [clustercode.jar:1.1.0]
clustercode | Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: net.chrigel.clustercode.cluster.impl.ClusterItem
clustercode | at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:574) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.JChannel.up(JChannel.java:721) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:891) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.pbcast.STATE_TRANSFER.handleStateRsp(STATE_TRANSFER.java:370) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:135) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:336) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.FRAG2.up(FRAG2.java:196) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.FlowControl.up(FlowControl.java:416) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:344) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:293) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.UNICAST3.deliverBatch(UNICAST3.java:1023) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.UNICAST3.removeAndDeliver(UNICAST3.java:832) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.UNICAST3.handleBatchReceived(UNICAST3.java:798) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:469) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:697) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.BARRIER.up(BARRIER.java:195) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:344) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:212) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:344) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:344) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:344) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.TP.passBatchUp(TP.java:1255) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.util.MaxOneThreadPerSender$BatchHandlerLoop.passBatchUp(MaxOneThreadPerSender.java:284) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.util.SubmitToThreadPool$BatchHandler.run(SubmitToThreadPool.java:136) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.util.MaxOneThreadPerSender$BatchHandlerLoop.run(MaxOneThreadPerSender.java:273) ~[clustercode.jar:1.1.0]
clustercode | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
clustercode | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
clustercode | at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]
clustercode | Caused by: java.lang.ClassNotFoundException: net.chrigel.clustercode.cluster.impl.ClusterItem
clustercode | at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_131]
clustercode | at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_131]
clustercode | at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) ~[?:1.8.0_131]
clustercode | at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_131]
clustercode | at java.lang.Class.forName0(Native Method) ~[?:1.8.0_131]
clustercode | at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_131]
clustercode | at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:677) ~[?:1.8.0_131]
clustercode | at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1826) ~[?:1.8.0_131]
clustercode | at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713) ~[?:1.8.0_131]
clustercode | at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2000) ~[?:1.8.0_131]
clustercode | at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) ~[?:1.8.0_131]
clustercode | at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) ~[?:1.8.0_131]
clustercode | at java.util.HashMap.readObject(HashMap.java:1404) ~[?:1.8.0_131]
clustercode | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]
clustercode | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_131]
clustercode | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
clustercode | at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
clustercode | at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058) ~[?:1.8.0_131]
clustercode | at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2136) ~[?:1.8.0_131]
clustercode | at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) ~[?:1.8.0_131]
clustercode | at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) ~[?:1.8.0_131]
clustercode | at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) ~[?:1.8.0_131]
clustercode | at org.jgroups.blocks.ReplicatedHashMap.setState(ReplicatedHashMap.java:455) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.blocks.MessageDispatcher.handleUpEvent(MessageDispatcher.java:489) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:571) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.JChannel.up(JChannel.java:721) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:891) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.pbcast.STATE_TRANSFER.handleStateRsp(STATE_TRANSFER.java:370) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:135) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:336) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.FRAG2.up(FRAG2.java:196) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.FlowControl.up(FlowControl.java:416) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:344) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:293) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.UNICAST3.deliverBatch(UNICAST3.java:1023) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.UNICAST3.removeAndDeliver(UNICAST3.java:832) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.UNICAST3.handleBatchReceived(UNICAST3.java:798) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:469) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:697) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.BARRIER.up(BARRIER.java:195) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:344) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:212) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:344) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:344) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:344) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.protocols.TP.passBatchUp(TP.java:1255) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.util.MaxOneThreadPerSender$BatchHandlerLoop.passBatchUp(MaxOneThreadPerSender.java:284) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.util.SubmitToThreadPool$BatchHandler.run(SubmitToThreadPool.java:136) ~[clustercode.jar:1.1.0]
clustercode | at org.jgroups.util.MaxOneThreadPerSender$BatchHandlerLoop.run(MaxOneThreadPerSender.java:273) ~[clustercode.jar:1.1.0]
clustercode | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
clustercode | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
clustercode | at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]
clustercode | 2017-09-02 04:09:12.556 INFO - Could not create or join cluster. Will work as single node.
hm sadly I cannot reproduce this. Are you running all nodes on the same latest
version? If not, it won't work. Stop all containers, switch to latest (1.1.0) and restart the cluster. I'm sorry for the inconvenience :/
they should all be the same yes...
also im trying out the netdata stuff, but i can only get the one running netdata to show itself, not any of the others...
Are you really sure about this? Even the arbiter nodes should get upgraded with docker pull braindoctor/clustercode:latest
.
Based on your log this line Caused by: java.lang.ClassNotFoundException: net.chrigel.clustercode.cluster.impl.ClusterItem
indicates that at least one node is running 1.0.x. In 1.1. this class got renamed. This is the only explanation that makes sense to me...
What do you mean by but i can only get the one running netdata to show itself, not any of the others
? netdata itself is not clustered, but a single netdata server should be able to get multiple clustercode node metrics using the config. e.g.
{
"enable_autodetect": false,
"update_every": 2,
"nodes": [
{
"name": "docker-1",
"hostname": "docker-1.intern:7700",
"update_every": 2,
"progress_api": "/api/v1/progress/ffmpeg"
},
{
"name": "docker-2",
"hostname": "docker-2.intern:7700",
"update_every": 2,
"progress_api": "/api/v1/progress/ffmpeg"
}
]
}
hmmm my netdata config is using the ip address under hostname e.g. "192.168.0.101:7700" "192.168.0.102:7700" but otherwise that is how it's set up..
I just remembered that I had to install node
on my netdata server. Duh, adding it to the installation instructions^^
Your pkg manager might complain of "legacy node something something". In that case try installing nodejs
.
Is the version incompatibility resolved now?
looks like all the nodes are running now, but netdata still can't see any of the remotes
Can you post the contents of the netdata error log? It's usually in /var/log/netdata/error.log
.
ran it through a grep for cluster...
2017-09-04 02:28:20: node.d.plugin: ERROR: clustercode: node-0: Failed to make request, message: read ECONNRESET
2017-09-04 02:49:35: node.d.plugin: ERROR: clustercode: node-5: Failed to make request, message: connect ECONNREFUSED
2017-09-04 02:49:35: node.d.plugin: ERROR: clustercode: node-4: Failed to make request, message: connect ECONNREFUSED
2017-09-04 02:49:35: node.d.plugin: ERROR: clustercode: node-3: Failed to make request, message: connect ECONNREFUSED
2017-09-04 02:49:35: node.d.plugin: ERROR: clustercode: node-2: Failed to make request, message: connect ECONNREFUSED
2017-09-04 02:49:35: node.d.plugin: ERROR: clustercode: node-1: Failed to make request, message: connect ECONNREFUSED
2017-09-04 02:49:37: python.d ERROR: elasticsearch_local Url: http://127.0.0.1:9200/_cluster/stats. Error: HTTPConnectionPool(host='127.0.0.1', port=9200): Max retries exceeded with url: /_cluster/stats (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8f5ad23510>: Failed to establish a new connection: [Errno 111] Connection refused',))
2017-09-04 02:49:37: python.d ERROR: elasticsearch_local Url: http://127.0.0.1:9200/_cluster/health. Error: HTTPConnectionPool(host='127.0.0.1', port=9200): Max retries exceeded with url: /_cluster/health (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8f5aa05890>: Failed to establish a new connection: [Errno 111] Connection refused',))
hmm this looks like a networking issue. If you open your browser at node-1:7700/api/v1/status
(adjust accordingly) do you see an output? will it connect at all? Did you open the ports for the containers in you compose file?
hmmm definitely getting closer. it looks like my compose did get reverted to not including the extra port. then i got it to show one of the remote nodes, however the rest are throwing errors during launch again and wont join the cluster...
clustercode | 2017-09-05 00:39:43.533 WARN - catching
clustercode | org.jgroups.StateTransferException: state transfer failed
clustercode | at org.jgroups.JChannel.getState(JChannel.java:947) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.JChannel.getState(JChannel.java:579) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.JChannel.getState(JChannel.java:572) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.blocks.ReplicatedHashMap.start(ReplicatedHashMap.java:165) ~[clustercode.jar:1.2.0]
clustercode | at net.chrigel.clustercode.cluster.impl.JgroupsClusterImpl.joinCluster(JgroupsClusterImpl.java:75) [clustercode.jar:1.2.0]
clustercode | at net.chrigel.clustercode.statemachine.actions.InitializeAction.doExecute(InitializeAction.java:26) [clustercode.jar:1.2.0]
clustercode | at net.chrigel.clustercode.statemachine.Action.execute(Action.java:23) [clustercode.jar:1.2.0]
clustercode | at net.chrigel.clustercode.statemachine.Action.execute(Action.java:13) [clustercode.jar:1.2.0]
clustercode | at org.squirrelframework.foundation.fsm.impl.AbstractExecutionService$ActionContext.run(AbstractExecutionService.java:307) [clustercode.jar:1.2.0]
clustercode | at org.squirrelframework.foundation.fsm.impl.AbstractExecutionService.doExecute(AbstractExecutionService.java:81) [clustercode.jar:1.2.0]
clustercode | at org.squirrelframework.foundation.fsm.impl.AbstractExecutionService.executeActions(AbstractExecutionService.java:132) [clustercode.jar:1.2.0]
clustercode | at org.squirrelframework.foundation.fsm.impl.AbstractExecutionService.execute(AbstractExecutionService.java:140) [clustercode.jar:1.2.0]
clustercode | at org.squirrelframework.foundation.fsm.impl.AbstractStateMachine.internalStart(AbstractStateMachine.java:552) [clustercode.jar:1.2.0]
clustercode | at org.squirrelframework.foundation.fsm.impl.AbstractStateMachine.start(AbstractStateMachine.java:539) [clustercode.jar:1.2.0]
clustercode | at net.chrigel.clustercode.statemachine.states.StateController.initialize(StateController.java:61) [clustercode.jar:1.2.0]
clustercode | at net.chrigel.clustercode.Startup.main(Startup.java:83) [clustercode.jar:1.2.0]
clustercode | Caused by: java.lang.RuntimeException: java.io.InvalidClassException: net.chrigel.clustercode.cluster.ClusterTask; local class incompatible: stream classdesc serialVersionUID = -7580195094646305370, local class serialVersionUID = 3003505588645294047
clustercode | at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:574) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.JChannel.up(JChannel.java:721) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:891) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.pbcast.STATE_TRANSFER.handleStateRsp(STATE_TRANSFER.java:370) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:135) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:336) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.FRAG2.up(FRAG2.java:196) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.FlowControl.up(FlowControl.java:416) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:344) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:293) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.UNICAST3.deliverBatch(UNICAST3.java:1023) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.UNICAST3.removeAndDeliver(UNICAST3.java:832) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.UNICAST3.handleBatchReceived(UNICAST3.java:798) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:469) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:697) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.BARRIER.up(BARRIER.java:195) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:344) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:212) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:344) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:344) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:344) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.TP.passBatchUp(TP.java:1255) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.util.MaxOneThreadPerSender$BatchHandlerLoop.passBatchUp(MaxOneThreadPerSender.java:284) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.util.SubmitToThreadPool$BatchHandler.run(SubmitToThreadPool.java:136) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.util.MaxOneThreadPerSender$BatchHandlerLoop.run(MaxOneThreadPerSender.java:273) ~[clustercode.jar:1.2.0]
clustercode | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
clustercode | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
clustercode | at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]
clustercode | Caused by: java.io.InvalidClassException: net.chrigel.clustercode.cluster.ClusterTask; local class incompatible: stream classdesc serialVersionUID = -7580195094646305370, local class serialVersionUID = 3003505588645294047
clustercode | at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:616) ~[?:1.8.0_131]
clustercode | at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1843) ~[?:1.8.0_131]
clustercode | at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713) ~[?:1.8.0_131]
clustercode | at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2000) ~[?:1.8.0_131]
clustercode | at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) ~[?:1.8.0_131]
clustercode | at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) ~[?:1.8.0_131]
clustercode | at java.util.HashMap.readObject(HashMap.java:1404) ~[?:1.8.0_131]
clustercode | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]
clustercode | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_131]
clustercode | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
clustercode | at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
clustercode | at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058) ~[?:1.8.0_131]
clustercode | at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2136) ~[?:1.8.0_131]
clustercode | at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) ~[?:1.8.0_131]
clustercode | at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) ~[?:1.8.0_131]
clustercode | at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) ~[?:1.8.0_131]
clustercode | at org.jgroups.blocks.ReplicatedHashMap.setState(ReplicatedHashMap.java:455) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.blocks.MessageDispatcher.handleUpEvent(MessageDispatcher.java:489) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:571) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.JChannel.up(JChannel.java:721) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:891) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.pbcast.STATE_TRANSFER.handleStateRsp(STATE_TRANSFER.java:370) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:135) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:336) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.FRAG2.up(FRAG2.java:196) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.FlowControl.up(FlowControl.java:416) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:344) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:293) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.UNICAST3.deliverBatch(UNICAST3.java:1023) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.UNICAST3.removeAndDeliver(UNICAST3.java:832) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.UNICAST3.handleBatchReceived(UNICAST3.java:798) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:469) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:697) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.BARRIER.up(BARRIER.java:195) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:344) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:212) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:344) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:344) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.stack.Protocol.up(Protocol.java:344) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.protocols.TP.passBatchUp(TP.java:1255) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.util.MaxOneThreadPerSender$BatchHandlerLoop.passBatchUp(MaxOneThreadPerSender.java:284) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.util.SubmitToThreadPool$BatchHandler.run(SubmitToThreadPool.java:136) ~[clustercode.jar:1.2.0]
clustercode | at org.jgroups.util.MaxOneThreadPerSender$BatchHandlerLoop.run(MaxOneThreadPerSender.java:273) ~[clustercode.jar:1.2.0]
clustercode | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
clustercode | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
clustercode | at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]
clustercode | 2017-09-05 00:39:43.551 INFO - Could not create or join cluster. Will work as single node.
it's again version incompatibility :) I released version 1.2.0 yesterday, which comes with a web UI (very basic for now. I'm working on it). Now you can either upgrade all nodes to 1.2.0, or use the 1.1.0
image tag. be aware that 1.2 changed the port from 7700 to 8080 ;) I had to do that because of technical limitations/difficulties.
aha, ok things are close to working pretty smooth from the looks of it.
couple last things, the netdata only seems to be showing 3 of the nodes
and they randomly spit out stuff like
clustercode_node-1.fps: data download failed for url: http://192.168.0.13:19999/api/v1/data?chart=clustercode_node_1.fps&format=json&points=398&group=average&options=ms|flip|jsonwrap&after=-420
instead of the graph, and then they disappear until i restart the service
and it looks like the arbiter is spitting out the error
2017/09/06 03:20:40 [error] 16#16: *7 open() "/usr/src/clustercode/dist/manager/status" failed (2: No such file or directory), client: 172.17.0.1, server: clustercode, request: "GET /manager/status?XML=true HTTP/1.1", host: "localhost:8080" | stdout |
---|
Alright, at least some progress :)
The first looks like a netdata issue. There isn't much I can do. the second is strange though. Somebody (172.17.0.1) tries to make a request for /manager/status?XML=true, which doesn't exist on clustercode. /api/v1/status would exist, though the output will be JSON, not XML. Unless you reconfigured the api_path in the netdata/node.d/clustercode.conf I doubt it's netdata's fault. I fear you have to figure it out on your end and I would start looking for strange things in the netdata error log. Anyone can make bad requests, the question is who and why
restarted the netdata machine, and everything seems to be working pretty good now. though one of the nodes doesnt seem to be making progress and the progress graph is spikey (0 - .02) over and over. but i suspect thats an encoding issue on that machine.
it would be really nice if the netdata could say what file each node was working on as well :)
anyway thanks for all the help and the good work on this system. it seems to be coming along nicely
no problem and many thanks :)
It could be. What also could be is that the node fails at encoding at some point and then retries it over and over. I think I might add a counter of subsequent fails and shutdown if it's reached.
the problem with netdata is that it only collects numbers and nothing else. This is why I made the WebUI. If you point your browser at your-node:8080
you should see a list of the tasks and their progress :)