[Bug] JMX Stop on Harvester-managed recordings causes confusing server exception
andrewazores opened this issue · 0 comments
Agent-side logs:
2023-09-19 13:29:59,648 INFO [io.cry.age.Harvester] (RMI TCP Connection(12)-10.0.2.100) cryostat-agent(3) STOPPED
2023-09-19 13:29:59,649 INFO [io.cry.age.Harvester] (RMI TCP Connection(12)-10.0.2.100) cryostat-agent(3) CLOSED
2023-09-19 13:29:59,838 INFO [io.cry.age.CryostatClient] (cryostat-agent-worker-0) POST 200 (quarkus-test-agent-1_default_20230919T132959Z.jfr -> https://localhost:8181/api/beta/recordings/KnY4JVq_p8hVVoh-GsmOAarLxC6bwdSX9P2qDlU0520=): 392 KB/PT0.188978532S
2023-09-19 13:29:59,851 INFO [io.cry.age.Harvester] (cryostat-agent-harvester) cryostat-agent(4) RUNNING
2023-09-19 13:30:03,031 INFO [io.cry.age.CryostatClient] (cryostat-agent-registration) GET https://localhost:8181/api/v2.2/discovery/d5535a73-4774-40f4-be79-4721ef5787f2?token=eyJjdHkiOiJKV1QiLCJlbmMiOiJBMjU2R0NNIiwiYWxnIjoiZGlyIn0..ZupKK7gWmaotvvF_.EuQ2znPibxbWpN2a9NQjTIR3De8PPlglQ0aDz9A97MJnFDKWAM1-UQ2ohLPZxiiE5we7DOloGk9rTIsSdGgQkn_RHSkQQzLDCjk1ZJaHQa566LoNA4ZlpmRZspYq6WFvaY3BFobeytzgmAiFTupahelVap-yM0g3CWvKQNTNgKWOS5g-bUMZVNZHXBFnlOLgBb8ZV_HKO57IyhdwIR_nl8QSUJ2LykBtGOiQvZVat0sHO00NiIB__D-w1v1IXo4BpLHPPrAj14o3E_9B-ZV6AOv35HNmUg_ftYnZmhtFO0oBh3UxAov1gAx2K_Xt5mIzdL2Pf6NmXsNNno6BC_YF17oRdOb-vZke3_zFhqmZWIiBWRQqGGRXFyYBVXW45YL7tdVAMh-kprt5V91_9Ok0OBf66pdsTye9NBn8lu72klKc6seAoVOyAmL_wRlftOr_9e3ei_QYA4XL8aXsMU2UWdwn2KWojwkx6dWGJES4BpV55EqHonok4b7oiqmYonD43weHv3qKzWoCzyn0dBSJS40p4FkP7NZWDJS1d-EOUZvH5wYa1pIJ-U6FYkQ4AB37n6Uqrb4d154u.oPDrIfTyxrzX_jenTxZkrQ HTTP/1.1
2023-09-19 13:30:03,054 INFO [io.cry.age.CryostatClient] (cryostat-agent-worker-2) GET https://localhost:8181/api/v2.2/discovery/d5535a73-4774-40f4-be79-4721ef5787f2?token=eyJjdHkiOiJKV1QiLCJlbmMiOiJBMjU2R0NNIiwiYWxnIjoiZGlyIn0..ZupKK7gWmaotvvF_.EuQ2znPibxbWpN2a9NQjTIR3De8PPlglQ0aDz9A97MJnFDKWAM1-UQ2ohLPZxiiE5we7DOloGk9rTIsSdGgQkn_RHSkQQzLDCjk1ZJaHQa566LoNA4ZlpmRZspYq6WFvaY3BFobeytzgmAiFTupahelVap-yM0g3CWvKQNTNgKWOS5g-bUMZVNZHXBFnlOLgBb8ZV_HKO57IyhdwIR_nl8QSUJ2LykBtGOiQvZVat0sHO00NiIB__D-w1v1IXo4BpLHPPrAj14o3E_9B-ZV6AOv35HNmUg_ftYnZmhtFO0oBh3UxAov1gAx2K_Xt5mIzdL2Pf6NmXsNNno6BC_YF17oRdOb-vZke3_zFhqmZWIiBWRQqGGRXFyYBVXW45YL7tdVAMh-kprt5V91_9Ok0OBf66pdsTye9NBn8lu72klKc6seAoVOyAmL_wRlftOr_9e3ei_QYA4XL8aXsMU2UWdwn2KWojwkx6dWGJES4BpV55EqHonok4b7oiqmYonD43weHv3qKzWoCzyn0dBSJS40p4FkP7NZWDJS1d-EOUZvH5wYa1pIJ-U6FYkQ4AB37n6Uqrb4d154u.oPDrIfTyxrzX_jenTxZkrQ : 200
It looks like what is happening is that the remote JMX connection invokes the stopRecording
operation, which the target JVM accepts and performs. Immediately, the Agent Harvester code executing within the target JVM sees this updated condition and reacts:
The Agent dumps the stopped recording contents to a file, closes the recording, uploads the file dump to the server, and then starts a new replacement recording with the same name and other settings according to the Harvester configuration. The server meanwhile, after sending the JMX stopRecording command:
Optional<IRecordingDescriptor> descriptor =
getDescriptorByName(connection, recordingName);
if (descriptor.isPresent()) {
IRecordingDescriptor d = descriptor.get();
if (d.getState().equals(RecordingState.STOPPED) && quiet) {
return d;
}
connection.getService().stop(d);
this.cancelScheduledTasksIfExists(targetId, recordingName);
HyperlinkedSerializableRecordingDescriptor linkedDesc =
new HyperlinkedSerializableRecordingDescriptor(
d,
webServer.get().getDownloadURL(connection, d.getName()),
webServer.get().getReportURL(connection, d.getName()),
RecordingState.STOPPED);
this.issueNotification(targetId, linkedDesc, STOP_NOTIFICATION_CATEGORY);
return getDescriptorByName(connection, recordingName).get();
} else {
throw new RecordingNotFoundException(targetId, recordingName);
}
does a little other work of its own, then tries to query the target for the updated state of the recording by the same name so that it can inform the original client of the latest state. The expectation should be that only the running -> stopped
state has changed, but this re-query is done to ensure that we really report the latest state to the client. I believe the NoSuchElementException
arises because this follow-up query happens to occur in the timing interval where the Agent has deleted the original recording and has not yet re-started its replacement, so there is actually no recording with the expected name in the target JVM anymore.
tl;dr this is a conflict between trying to use manual recording stop operations on the Agent Harvester-managed recording. I'm not sure if it's really a bug in that case but maybe there is something to be done to make this result a little more clear and explicit to the user. In the Agent HTTP connection case that is probably easier to do since we have more control. In this JMX case, I don't know what can be done really.
Originally posted by @andrewazores in https://github.com/cryostatio/cryostat/issues/1608#issuecomment-1725576070