real-stanford/cow

Error when running startx.py

Closed this issue · 6 comments

`(base) oppo@car1-ipc:~/yuwei/cow/scripts$ sudo python startx.py

['Xorg', '-noreset', '+extension', 'GLX', '+extension', 'RANDR', '+extension', 'RENDER', '-config', '/tmp/tmpT3GoGb', ':0']

_XSERVTransSocketUNIXCreateListener: ...SocketCreateListener() failed

_XSERVTransMakeAllCOTSServerListeners: server already running

(EE)

Fatal server error:

(EE) Cannot establish any listening sockets - Make sure an X server isn't already running(EE)

(EE)

Please consult the The X.Org Foundation support

 at http://wiki.x.org

for help.

(EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.

(EE)

(EE) Server terminated with error (1). Closing log file.`

Is it mean that I've already start the Xorg processes? Is it normal or abnormal? Looking for your reply.

What does it look like when you do nvidia-smi do you see Xorg processes running on each GPU?

yes, there are two Xorg processes. So it's normal, because the processes are already running. Should I specify display port for the startx.py? For example, Xorg -noreset +extension GLX +extension RANDR +extension RENDER -config /tmp/tmpT3GoGb :2
In fact, I've already tried this, and I got a black interface with nothing, and it filled all the screen. I have to use tty mode to switch to my original gnome desktop. The question is, the black interface is normal or abnormal?

What does it look like when you do nvidia-smi do you see Xorg processes running on each GPU?

Xorg should be running on each GPU. When I run nvidia-smi I see the following:
Screen Shot 2023-06-05 at 8 54 08 AM

If this is the case you should be good to move on to the next steps.

Thank you! But I found another bug, when running python pasture_runner.py -a src.models.agent_fbe_owl -n 8 --arch B32 --center, I still got a black graphic interface with nothing.
I modified pasture_runner.py here, it works. I referenced the minimal example of ai2thor library(https://github.com/allenai/ai2thor), in which the port is None. Maybe there is something wrong about the display port.
image
image

Hi @zyw1515414231 I usually run on a headless server, which is why the code targets x_display. If you are seeing issues with that line it is also possible that your displays are named differently than the displays on my server.

You can try running the following:

import glob
import os
import Xlib
import Xlib.display

displays = []

open_display_strs = [
    os.path.basename(s)[1:] for s in glob.glob("/tmp/.X11-unix/X*")
]

for open_display_str in sorted(open_display_strs):
    try:
        open_display_str = str(int(open_display_str))
    except Exception:
        continue

    try:
        display = Xlib.display.Display(":{}".format(open_display_str))

        displays.extend(
            [f"{open_display_str}.{i}" for i in range(display.screen_count())]
        )
    except:
        print(f"cannot open display: {open_display_str}")

print(displays)

On my 8 GPU server I see:

['0.0', '0.1', '0.2', '0.3', '0.4', '0.5', '0.6', '0.7']

Of course if you are trying to get the THOR display to pop up on a local machine, your solution seems good too!

Thank you Samir!