HubSpot/Singularity

UI problems

ediril opened this issue · 20 comments

While running shaded jar version 0.22 (I downloaded it from here), I ran into the following problems with the UI:

  1. On the Requests page, the DeployId links don't work because the links don't have the request-id in them. They are like this: http://localhost:7099/singularity/request/undefined/deploy/1. Notice it says undefined instead of the actual request id

  2. After I navigate to the Deployment page for a specific Request id using the correct link from above, I can see the history of tasks that ran for that deployment. Each task in the history has a link to its logs. However Singularity can't seem to find/read the logs on the slave. Clicking on any of them says:

stdout does not exist in this directory.
It may have been moved to stdout
Back to Task Detail Page

I can actually view the logs if I look at them from the Mesos UI by clicking on the specific Sandbox link. In fact I can see both stdout and stderr. For some reason, Singularity framework is unable to get to the logs.

It would be great if these two UI issues could be fixed.
Thank you

First one will be addressed by #1978

The second one I'm not able to replicate. When you click the logs link:

  1. Is it this one you are talking about?

Screen Shot 2019-07-17 at 4 43 21 PM

  1. What url does it put you on? I'm curious if it's a UI issue or possibly something with Singularity's access/config. Singularity tries to fetch the files from the mesos-slave api. So, it will hit the mesos slave on {mesos slave hostname}:5051. If it can't access that (due to security groups or something) then that could also be the cause

Regarding the second one, yes that's the link I'm talking about. It's unable to tail the slave logs for any task. Clicking that link takes me to http://localhost:7099/singularity/task/sleep-ondemand-C-1563394569546-1-mesos_slave-DEFAULT/tail/stdout where I see the message I posted above

When I look at Singularity logs, I can see that it tried to access the slave but it failed:

ERROR [2019-07-17 21:11:07,600] com.hubspot.singularity.mesos.SingularityMesosExecutorInfoSupport: While fetching directory and container id for task: sleep-ondemand-C-1563394569546-1-mesos_slave-DEFAULT
! java.io.IOException: Remotely Closed
! Causing: com.hubspot.horizon.HttpRuntimeException: java.io.IOException: Remotely Closed
! at com.hubspot.horizon.ning.NingHttpClient.execute(NingHttpClient.java:43)
! at com.hubspot.mesos.client.SingularityMesosClient.getFromMesos(SingularityMesosClient.java:68)
! ... 11 common frames omitted
! Causing: com.hubspot.mesos.client.MesosClient$MesosClientException: Exception fetching http://mesos-slave:5051/slave(1)/state after 00:14.445

The ip:port for the slave is correct, and I'm running the hubspot/singularityexecutorslave:0.21.0 docker image as my slave. The slave is able to execute Run-Once and On-Demand tasks. I just can't tail the logs for some reason from the web UI.

Any ideas?

Hmm, best guess is that the ip:port isn't accessible from singularity. Are you able to curl that same endpoint from the container that singularity is running in?

So you are saying when I navigate to http://mesos-slave:5051/slave(1)/state in the browser or via curl, I should get something back? This doesn't work for me, I get: mesos-slave didn’t send any data

What does it do for you? What data do you get back?

(For reference, my mesos-master is the mesosphere/mesos-master:1.7.1 docker image)

As another data point, if I switch to using mesosphere/mesos-slave:1.7.1 image, then http://mesos-slave:5051/slave(1)/state endpoint does return json data (and the exception goes away). However, the Logs link still doesn't work.

Does this functionality require using your hubspot/singularityexecutorslave:0.21.0 slave image?

Ah, it's missing #1949 , which we added as part of the upgrade to 1.8. We're overdue to release some new stuff anyways. I'll try and get a release put together today or tomorrow morning and get 0.23.0 out there

Awesome thank you! Could you also please make sure #1978 gets included in there as well?

https://github.com/HubSpot/Singularity/releases/tag/Singularity-0.23.0

just released in sonatype, jars should show up in a little while on maven central

I downloaded 0.23 (shaded) from here. #1978 should be in this release correct? The link still has undefined instead of request-id. Just wanted to let you know.

My environment at work is a bit locked down so I've been relying on these release jars to evaluate Singularity, but I'll try to build it locally.

It should be in there. May need to hard refresh. We have one open issue on the cache headers being too aggressive and not updating nicely between releases

Ah that was it, thank you very much!

Unfortunately, the Logs link still doesn't work for me, I get the same message. I'm not sure how to debug this on my end, I'm not getting any errors from SingularityService

I have the following images running on docker:

hubspot/singularityexecutorslave:0.23.0
mesosphere/mesos-master:1.7.1
mesoscloud/zookeeper:3.4.8-ubuntu-14.04

and I run SingularityService on the command line on the host machine (Windows 10) directly. Everything appears to be running properly and I can see the logs if I look at them in the mesos web UI.

Ok. Places to look for stack traces:

  • console logs in your browser
  • SingularityService logs

I'd expect the first in this case. Also, I've never actually tested this all out on a windows machine, only mac/linux. So there could possibly be some weirdness there

Oh I think I found the problem:

When I click on the Logs link, the page it takes me does a call to this url: http://localhost:7099/singularity/api/sandbox/sleep-once-6-1563564937641-1-mesos_slave-DEFAULT/read?path=stdout&length=0 which returns 404 with this message:

File \var\lib\mesos\slaves\c333d435-438b-4c7d-98e4-ef02f9a842cf-S0\frameworks\Singularity\executors\sleep-once-6-1563564937641-1-mesos_slave-DEFAULT\runs\a2d927d4-4c41-4b89-b995-9eb2a7d315a5\stdout does not exist for task ID sleep-once-6-1563564937641-1-mesos_slave-DEFAULT

That file actually exists on the slave but notice the \. If those are replaced with /, then it works.. So this must be the issue with Java paths defaulting to \ on windows.

Is this something easy to fix? It would be very convenient to get this working so we don't have to dig into the mesos web UI to look at logs.

Btw, when I look at http://127.0.0.1:5051/slave(1)/state, I see this:

"name":"Command Executor (Task: sleep-once-6-1563564937641-1-mesos_slave-DEFAULT) (Command: [/home/sleep_...])",
"source":"sleep-once-6-1563564937641-1-mesos_slave-DEFAULT",
"container":"a2d927d4-4c41-4b89-b995-9eb2a7d315a5",
"directory":"/var/lib/mesos/slaves/c333d435-438b-4c7d-98e4-ef02f9a842cf-S0/frameworks/Singularity/executors/sleep-once-6-1563564937641-1-mesos_slave-DEFAULT/runs/a2d927d4-4c41-4b89-b995-9eb2a7d315a5",
...

So I imagine it's Java using the \ because it's running on Windows.

Ah, didn't realize that. Unfortunately there aren't a lot of dev running windows here at HubSpot. I can take a quick look and would be happy to review/merge any PRs if you find the issue as well

I'm going to be away next week, but going to cc @baconmania @pschoenfelder @sjeropkipruto who might be able to finish this up as well and release 0.23.1

FYI, this is not forgotten, just trickier than expected. Mesos does in fact have support for windows, meaning that if I do a blanket replace \ -> /, then if this ever runs against windows mesos it would be wrong. Right now the assumption is that Singularity is running on the same os it is managing

Another option for you could be to used some of our published docker images and run Singularity there

@ssalinas Thank you for the update. Speaking of Singularity docker image, it turns out Docker for Windows currently does not support HOST networking mode either. You should consider adding Windows to the note on Try it out page.

Yes I agree this is a tricky situation. In my setup, Singularity is running on Windows while Mesos slave is running inside a docker container running Linux. Could you simply use / in path names regardless of OS? Not ideal but could be a practical solution