OpenDriveLab/CLOVER

About the evaluation on CALVIN with checkpoint provided

Closed this issue · 6 comments

Thanks for providing such a great work! Considering the large size of the CALVIN dataset, I conducted evaluations only on the debug dataset. Is this result reliable?

Snipaste_2024-10-31_14-03-47

The performance may vary given different hardware setups and environments. We've tested with multiple evaluation seeds and the performance variation should be within (+- 0.1 Avg.Len). Can you provide further details about your test bed?
截屏2024-10-31 14 22 19

The performance may vary given different hardware setups and environments. We've tested with multiple evaluation seeds and the performance variation should be within (+- 0.1 Avg.Len). Can you provide further details about your test bed? 截屏2024-10-31 14 22 19

Sure, I conducted evaluations only on the debug dataset.
The full output is as follows.
`[2024-10-30 19:35:50,798] torch.distributed.run: [WARNING]
[2024-10-30 19:35:50,798] torch.distributed.run: [WARNING] *****************************************
[2024-10-30 19:35:50,798] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2024-10-30 19:35:50,798] torch.distributed.run: [WARNING] *****************************************
pybullet build time: Nov 28 2023 23:52:03
pybullet build time: Nov 28 2023 23:52:03
pybullet build time: Nov 28 2023 23:52:03
pybullet build time: Nov 28 2023 23:52:03
[W Utils.hpp:164] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarBool)
is_using_torchrun
[W Utils.hpp:164] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarBool)
is_using_torchrun
[W Utils.hpp:164] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarBool)
is_using_torchrun
[W Utils.hpp:164] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarBool)
is_using_torchrun
device_id: cuda:0
world_size: 4
device_id: cuda:2
world_size: 4
device_id: cuda:3
world_size: 4
device_id: cuda:1
world_size: 4
Loading checkpoint from /data1/gyh/models/CLOVER/visual_planner.pt
Loading checkpoint from /data1/gyh/models/CLOVER/visual_planner.pt
Loading checkpoint from /data1/gyh/models/CLOVER/visual_planner.pt
Loading checkpoint from /data1/gyh/models/CLOVER/visual_planner.pt
[rank0]:[W Utils.hpp:106] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarString)
[rank3]:[W Utils.hpp:106] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarString)
[rank1]:[W Utils.hpp:106] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarString)
[rank2]:[W Utils.hpp:106] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarString)
[rank3]:[W Utils.hpp:106] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarString)
[rank1]:[W Utils.hpp:106] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarString)
[rank2]:[W Utils.hpp:106] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarString)
Loading policy checkpoint from /data1/gyh/models/CLOVER/feedback_policy.pth
[rank0]:[W Utils.hpp:106] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarString)
Loading policy checkpoint from /data1/gyh/models/CLOVER/feedback_policy.pth
Loading policy checkpoint from /data1/gyh/models/CLOVER/feedback_policy.pth
Loading policy checkpoint from /data1/gyh/models/CLOVER/feedback_policy.pth
argv[0]=
startThreads creating 1 threads.
starting thread 0
started thread 0
argc=3
argv[0] = --unused
argv[1] =
argv[2] = --start_demo_name=Physics Server
ExampleBrowserThreadFunc started
X11 functions dynamically loaded using dlopen/dlsym OK!
X11 functions dynamically loaded using dlopen/dlsym OK!
argv[0]=
startThreads creating 1 threads.
starting thread 0
started thread 0
argc=3
argv[0] = --unused
argv[1] =
argv[2] = --start_demo_name=Physics Server
ExampleBrowserThreadFunc started
X11 functions dynamically loaded using dlopen/dlsym OK!
X11 functions dynamically loaded using dlopen/dlsym OK!
argv[0]=
startThreads creating 1 threads.
starting thread 0
started thread 0
argc=3
argv[0] = --unused
argv[1] =
argv[2] = --start_demo_name=Physics Server
ExampleBrowserThreadFunc started
X11 functions dynamically loaded using dlopen/dlsym OK!
argv[0]=
startThreads creating 1 threads.
starting thread 0
started thread 0
argc=3
argv[0] = --unused
argv[1] =
argv[2] = --start_demo_name=Physics Server
ExampleBrowserThreadFunc started
X11 functions dynamically loaded using dlopen/dlsym OK!
X11 functions dynamically loaded using dlopen/dlsym OK!
X11 functions dynamically loaded using dlopen/dlsym OK!
Creating context
Creating context
Created GL 3.3 context
Direct GLX rendering context obtained
Making context current
GL_VENDOR=Mesa/X.org
GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits)
GL_VERSION=4.1 (Core Profile) Mesa 21.2.6
GL_SHADING_LANGUAGE_VERSION=4.50
pthread_getconcurrency()=0
Version = 4.1 (Core Profile) Mesa 21.2.6
Vendor = Mesa/X.org
Renderer = llvmpipe (LLVM 12.0.0, 256 bits)
Created GL 3.3 context
Direct GLX rendering context obtained
Making context current
Creating context
GL_VENDOR=Mesa/X.org
GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits)
GL_VERSION=4.1 (Core Profile) Mesa 21.2.6
GL_SHADING_LANGUAGE_VERSION=4.50
pthread_getconcurrency()=0
Version = 4.1 (Core Profile) Mesa 21.2.6
Vendor = Mesa/X.org
Renderer = llvmpipe (LLVM 12.0.0, 256 bits)
Created GL 3.3 context
Creating context
Direct GLX rendering context obtained
Making context current
GL_VENDOR=Mesa/X.org
GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits)
GL_VERSION=4.1 (Core Profile) Mesa 21.2.6
GL_SHADING_LANGUAGE_VERSION=4.50
pthread_getconcurrency()=0
Version = 4.1 (Core Profile) Mesa 21.2.6
Vendor = Mesa/X.org
Renderer = llvmpipe (LLVM 12.0.0, 256 bits)
Created GL 3.3 context
Direct GLX rendering context obtained
Making context current
GL_VENDOR=Mesa/X.org
GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits)
GL_VERSION=4.1 (Core Profile) Mesa 21.2.6
GL_SHADING_LANGUAGE_VERSION=4.50
pthread_getconcurrency()=0
Version = 4.1 (Core Profile) Mesa 21.2.6
Vendor = Mesa/X.org
Renderer = llvmpipe (LLVM 12.0.0, 256 bits)
b3Printf: Selected demo: Physics Server
startThreads creating 1 threads.
starting thread 0
started thread 0
MotionThreadFunc thread started
b3Printf: Selected demo: Physics Server
startThreads creating 1 threads.
starting thread 0
started thread 0
MotionThreadFunc thread started
b3Printf: Selected demo: Physics Server
startThreads creating 1 threads.
starting thread 0
started thread 0
MotionThreadFunc thread started
b3Printf: Selected demo: Physics Server
startThreads creating 1 threads.
starting thread 0
started thread 0
MotionThreadFunc thread started
ven = Mesa/X.org
ven = Mesa/X.org
ven = Mesa/X.org
ven = Mesa/X.org
ven = Mesa/X.org
ven = Mesa/X.org
ven = Mesa/X.org
ven = Mesa/X.org
received depth0
received depth0
received depth0
received depth0
successfully load env
successfully load env
successfully wrapped model
logging to /tmp/evaluation
0%| | 0/250 [00:00<?, ?it/s]successfully wrapped model
logging to /tmp/evaluation
0%| | 0/250 [00:00<?, ?it/s]successfully load env
successfully load env
successfully wrapped model
successfully wrapped model
logging to /tmp/evaluation
0%| | 0/250 [00:00<?, ?it/s]logging to /tmp/evaluation
1/5 : 94.4% | 2/5 : 79.6% | 3/5 : 65.6% | 4/5 : 53.2% | 5/5 : 38.4% ||: 100%|████████████████████████████████████████████| 250/250 [11:39:47<00:00, 167.95s/it]
1/5 : 94.4% | 2/5 : 79.6% | 3/5 : 62.4% | 4/5 : 49.6% | 5/5 : 36.8% ||: 100%|████████████████████████████████████████████| 250/250 [12:06:51<00:00, 174.45s/it]
1/5 : 95.2% | 2/5 : 78.4% | 3/5 : 62.4% | 4/5 : 44.8% | 5/5 : 32.8% ||: 100%|████████████████████████████████████████████| 250/250 [12:07:11<00:00, 174.53s/it]
1/5 : 95.2% | 2/5 : 78.8% | 3/5 : 56.8% | 4/5 : 43.2% | 5/5 : 33.2% ||: 100%|████████████████████████████████████████████| 250/250 [12:09:16<00:00, 175.03s/it]
Results for Epoch 0:
Average successful sequence length: 3.187
Success rates for i instructions in a row:
1: 94.8%
2: 79.1%
3: 61.8%
4: 47.7%
5: 35.3%
disconnecting id 0 from server
rotate_blue_block_right: 50 / 69 | SR: 72.5%
move_slider_right: 230 / 236 | SR: 97.5%
lift_red_block_slider: 81 / 110 | SR: 73.6%
place_in_slider: 162 / 295 | SR: 54.9%
turn_off_lightbulb: 116 / 120 | SR: 96.7%
turn_off_led: 139 / 141 | SR: 98.6%
push_into_drawer: 68 / 97 | SR: 70.1%
lift_blue_block_drawer: 10 / 15 | SR: 66.7%
lift_pink_block_slider: 92 / 112 | SR: 82.1%
open_drawer: 303 / 306 | SR: 99.0%
rotate_red_block_right: 52 / 69 | SR: 75.4%
lift_pink_block_table: 139 / 157 | SR: 88.5%
push_blue_block_left: 57 / 63 | SR: 90.5%
close_drawer: 163 / 165 | SR: 98.8%
push_pink_block_right: 39 / 61 | SR: 63.9%
push_red_block_right: 39 / 66 | SR: 59.1%
push_red_block_left: 59 / 69 | SR: 85.5%
lift_blue_block_table: 147 / 162 | SR: 90.7%
rotate_blue_block_left: 47 / 62 | SR: 75.8%
place_in_drawer: 102 / 141 | SR: 72.3%
turn_on_lightbulb: 140 / 144 | SR: 97.2%
move_slider_left: 212 / 227 | SR: 93.4%
rotate_red_block_left: 51 / 55 | SR: 92.7%
turn_on_led: 146 / 151 | SR: 96.7%
stack_block: 84 / 155 | SR: 54.2%
push_pink_block_left: 56 / 71 | SR: 78.9%
lift_red_block_table: 138 / 157 | SR: 87.9%
lift_pink_block_drawer: 4 / 7 | SR: 57.1%
rotate_pink_block_right: 58 / 66 | SR: 87.9%
lift_blue_block_slider: 84 / 116 | SR: 72.4%
push_blue_block_right: 29 / 65 | SR: 44.6%
rotate_pink_block_left: 44 / 53 | SR: 83.0%
unstack_block: 37 / 40 | SR: 92.5%
lift_red_block_drawer: 9 / 11 | SR: 81.8%

disconnecting id 0 from server
Best model: epoch 0 with average sequences length of 3.187
disconnecting id 0 from server
disconnecting id 0 from server
numActiveThreads = 0
stopping threads
destroy semaphore
semaphore destroyed
Thread with taskId 0 exiting
Thread TERMINATED
destroy main semaphore
main semaphore destroyed
numActiveThreads = 0
stopping threads
Thread with taskId 0 exiting
Thread TERMINATED
destroy semaphore
semaphore destroyed
destroy main semaphore
main semaphore destroyed
numActiveThreads = 0
stopping threads
destroy semaphore
semaphore destroyed
Thread with taskId 0 exiting
Thread TERMINATED
destroy main semaphore
main semaphore destroyed
numActiveThreads = 0
stopping threads
destroy semaphore
semaphore destroyed
Thread with taskId 0 exiting
Thread TERMINATED
destroy main semaphore
main semaphore destroyed
finished
numActiveThreads = 0
btShutDownExampleBrowser stopping threads
Thread with taskId 0 exiting
Thread TERMINATED
destroy semaphore
semaphore destroyed
destroy main semaphore
main semaphore destroyed
finished
numActiveThreads = 0
btShutDownExampleBrowser stopping threads
destroy semaphore
semaphore destroyed
Thread with taskId 0 exiting
Thread TERMINATED
destroy main semaphore
main semaphore destroyed
finished
numActiveThreads = 0
btShutDownExampleBrowser stopping threads
Thread with taskId 0 exiting
Thread TERMINATED
destroy semaphore
semaphore destroyed
destroy main semaphore
main semaphore destroyed
finished
numActiveThreads = 0
btShutDownExampleBrowser stopping threads
destroy semaphore
semaphore destroyed
Thread with taskId 0 exiting
Thread TERMINATED
destroy main semaphore
main semaphore destroyed`

Environment:

  • ubuntu: 20.04.3 LTS
  • Pytorch 2.2
  • 4 A6000
  • conda list:
    _libgcc_mutex=0.1=conda_forge
    _openmp_mutex=4.5=2_gnu
    absl-py=2.1.0=pypi_0
    accelerate=1.0.1=pypi_0
    addict=2.4.0=pypi_0
    aiohappyeyeballs=2.4.3=pypi_0
    aiohttp=3.10.10=pypi_0
    aiosignal=1.3.1=pypi_0
    antlr4-python3-runtime=4.8=pypi_0
    asttokens=2.4.1=pypi_0
    async-timeout=4.0.3=pypi_0
    attrs=24.2.0=pypi_0
    beautifulsoup4=4.12.3=pypi_0
    blas=1.0=mkl
    blinker=1.8.2=pypi_0
    blosc=1.11.2=pypi_0
    brotli-python=1.0.9=py39h6a678d5_8
    bzip2=1.0.8=h5eee18b_6
    ca-certificates=2024.8.30=hbcca054_0
    calvin=0.0.1=dev_0
    calvin-env=0.0.1=pypi_0
    certifi=2024.8.30=pyhd8ed1ab_0
    cffi=1.14.2=pypi_0
    charset-normalizer=3.3.2=pyhd3eb1b0_0
    click=8.1.7=pypi_0
    clip=1.0=pypi_0
    cloudpickle=3.1.0=pypi_0
    cmake=3.18.4=pypi_0
    colorama=0.4.6=pyhd8ed1ab_0
    colorlog=6.8.2=pypi_0
    comm=0.2.2=pypi_0
    configargparse=1.7=pypi_0
    contourpy=1.3.0=pypi_0
    cuda-cudart=12.1.105=0
    cuda-cupti=12.1.105=0
    cuda-libraries=12.1.0=0
    cuda-nvrtc=12.1.105=0
    cuda-nvtx=12.1.105=0
    cuda-opencl=12.6.68=0
    cuda-runtime=12.1.0=0
    cuda-version=12.6=3
    cycler=0.12.1=pypi_0
    dash=2.18.1=pypi_0
    dash-core-components=2.0.0=pypi_0
    dash-html-components=2.0.0=pypi_0
    dash-table=5.0.0=pypi_0
    decorator=4.4.2=pypi_0
    diffusers=0.30.3=pypi_0
    docker-pycreds=0.4.0=pypi_0
    docstring-parser=0.16=pypi_0
    einops=0.8.0=pypi_0
    einops-exts=0.0.4=pypi_0
    ema-pytorch=0.7.3=pypi_0
    exceptiongroup=1.2.2=pypi_0
    executing=2.1.0=pypi_0
    fastjsonschema=2.20.0=pypi_0
    ffmpeg=4.3=hf484d3e_0
    filelock=3.13.1=py39h06a4308_0
    flask=3.0.3=pypi_0
    fonttools=4.54.1=pypi_0
    freetype=2.12.1=h4a9f257_0
    freetype-py=2.5.1=pypi_0
    frozenlist=1.4.1=pypi_0
    fsspec=2024.9.0=pypi_0
    ftfy=6.2.3=pypi_0
    gdown=5.2.0=pypi_0
    gitdb=4.0.11=pypi_0
    gitpython=3.1.43=pypi_0
    gmp=6.2.1=h295c915_3
    gmpy2=2.1.2=py39heeb90bb_0
    gnutls=3.6.15=he1e5248_0
    grpcio=1.66.1=pypi_0
    gym=0.26.2=pypi_0
    gym-notices=0.0.8=pypi_0
    h5py=3.12.1=pypi_0
    html-testrunner=1.2.1=pypi_0
    huggingface-hub=0.25.1=pypi_0
    hydra-colorlog=1.2.0=pypi_0
    hydra-core=1.1.1=pypi_0
    idna=3.7=py39h06a4308_0
    imageio=2.36.0=pypi_0
    imageio-ffmpeg=0.5.1=pypi_0
    importlib-metadata=8.5.0=pypi_0
    importlib-resources=6.4.5=pypi_0
    intel-openmp=2023.1.0=hdb19cb5_46306
    ipython=8.18.1=pypi_0
    ipywidgets=8.1.5=pypi_0
    itsdangerous=2.2.0=pypi_0
    jedi=0.19.1=pypi_0
    jinja2=3.1.4=py39h06a4308_0
    joblib=1.4.2=pypi_0
    jpeg=9e=h5eee18b_3
    jsonschema=4.23.0=pypi_0
    jsonschema-specifications=2023.12.1=pypi_0
    jupyter-core=5.7.2=pypi_0
    jupyterlab-widgets=3.0.13=pypi_0
    kiwisolver=1.4.7=pypi_0
    lame=3.100=h7b6447c_0
    lazy-loader=0.4=pypi_0
    lcms2=2.12=h3be6417_0
    ld_impl_linux-64=2.40=h12ee557_0
    lerc=3.0=h295c915_0
    libcublas=12.1.0.26=0
    libcufft=11.0.2.4=0
    libcufile=1.11.1.6=0
    libcurand=10.3.7.68=0
    libcusolver=11.4.4.55=0
    libcusparse=12.0.2.55=0
    libdeflate=1.17=h5eee18b_1
    libffi=3.4.4=h6a678d5_1
    libgcc=14.2.0=h77fa898_1
    libgcc-ng=14.2.0=h69a702a_1
    libgomp=14.2.0=h77fa898_1
    libiconv=1.16=h5eee18b_3
    libidn2=2.3.4=h5eee18b_0
    libjpeg-turbo=2.0.0=h9bf148f_0
    libnpp=12.0.2.50=0
    libnvjitlink=12.1.105=0
    libnvjpeg=12.1.1.14=0
    libpng=1.6.39=h5eee18b_0
    libstdcxx-ng=11.2.0=h1234567_1
    libtasn1=4.19.0=h5eee18b_0
    libtiff=4.5.1=h6a678d5_0
    libunistring=0.9.10=h27cfd23_0
    libwebp-base=1.3.2=h5eee18b_0
    lightning-lite=1.8.6=pypi_0
    lightning-utilities=0.11.8=pyhd8ed1ab_0
    llvm-openmp=14.0.6=h9e868ea_0
    llvmlite=0.43.0=pypi_0
    lxml=5.3.0=pypi_0
    lz4-c=1.9.4=h6a678d5_1
    markdown=3.7=pypi_0
    markdown-it-py=3.0.0=pypi_0
    markupsafe=2.1.3=py39h5eee18b_0
    matplotlib=3.9.2=pypi_0
    matplotlib-inline=0.1.7=pypi_0
    mdurl=0.1.2=pypi_0
    mkl=2023.1.0=h213fc3f_46344
    mkl-service=2.4.0=py39h5eee18b_1
    mkl_fft=1.3.10=py39h5eee18b_0
    mkl_random=1.2.7=py39h1128e8f_0
    moviepy=1.0.3=pypi_0
    mpc=1.1.0=h10f8cd9_1
    mpfr=4.0.2=hb69a4c5_1
    mpmath=1.3.0=py39h06a4308_0
    multicoretsne=0.1=pypi_0
    multidict=6.1.0=pypi_0
    mypy-extensions=1.0.0=pypi_0
    natsort=8.4.0=pypi_0
    nbformat=5.10.4=pypi_0
    ncurses=6.4=h6a678d5_0
    nest-asyncio=1.6.0=pypi_0
    nettle=3.7.3=hbbd107a_1
    networkx=3.2.1=pypi_0
    numba=0.60.0=pypi_0
    numpy=1.23.0=pypi_0
    numpy-quaternion=2023.0.4=pypi_0
    nvidia-ml-py=12.560.30=pypi_0
    omegaconf=2.1.2=pypi_0
    open-clip-torch=2.29.0=pypi_0
    open3d=0.18.0=pypi_0
    opencv-python=4.10.0.84=pypi_0
    openh264=2.1.1=h4ff587b_0
    openjpeg=2.5.2=he7f1fd0_0
    openssl=3.3.2=hb9d3cd8_0
    packaging=24.1=pyhd8ed1ab_0
    pandas=2.2.3=pypi_0
    parso=0.8.4=pypi_0
    pexpect=4.9.0=pypi_0
    pillow=10.4.0=py39h5eee18b_0
    pip=24.2=py39h06a4308_0
    platformdirs=4.3.6=pypi_0
    plotly=5.24.1=pypi_0
    proglog=0.1.10=pypi_0
    prompt-toolkit=3.0.48=pypi_0
    propcache=0.2.0=pypi_0
    protobuf=5.28.2=pypi_0
    psutil=6.0.0=pypi_0
    ptyprocess=0.7.0=pypi_0
    pure-eval=0.2.3=pypi_0
    pybullet=3.2.6=pypi_0
    pycollada=0.6=pypi_0
    pycparser=2.22=pypi_0
    pyglet=2.0.18=pypi_0
    pygments=2.18.0=pypi_0
    pyhash=0.9.3=pypi_0
    pyopengl=3.1.0=pypi_0
    pyparsing=3.1.4=pypi_0
    pyquaternion=0.9.9=pypi_0
    pyrender=0.1.45=pypi_0
    pyrep=4.1.0.2=dev_0
    pysocks=1.7.1=py39h06a4308_0
    python=3.9.19=h955ad1f_1
    python-dateutil=2.9.0.post0=pypi_0
    pytorch=2.2.0=py3.9_cuda12.1_cudnn8.9.2_0
    pytorch-cuda=12.1=ha16c6d3_5
    pytorch-fid=0.3.0=pypi_0
    pytorch-lightning=1.8.6=pypi_0
    pytorch-mutex=1.0=cuda
    pytz=2024.2=pypi_0
    pyyaml=6.0.1=py39h5eee18b_0
    readline=8.2=h5eee18b_0
    referencing=0.35.1=pypi_0
    regex=2024.9.11=pypi_0
    requests=2.32.3=py39h06a4308_0
    retrying=1.3.4=pypi_0
    rich=13.9.3=pypi_0
    rlbench=1.2.0=dev_0
    rotary-embedding-torch=0.8.4=pypi_0
    rpds-py=0.20.0=pypi_0
    safetensors=0.4.5=pypi_0
    scikit-image=0.24.0=pypi_0
    scikit-learn=1.5.2=pypi_0
    scikit-video=1.1.11=pypi_0
    scipy=1.13.1=pypi_0
    sentence-transformers=3.2.1=pypi_0
    sentry-sdk=2.14.0=pypi_0
    setproctitle=1.3.3=pypi_0
    setuptools=57.5.0=pypi_0
    six=1.16.0=pypi_0
    smmap=5.0.1=pypi_0
    soupsieve=2.6=pypi_0
    sqlite=3.45.3=h5eee18b_0
    stack-data=0.6.3=pypi_0
    sympy=1.13.2=py39h06a4308_0
    tacto=0.0.3=dev_0
    tbb=2021.8.0=hdb19cb5_0
    tenacity=9.0.0=pypi_0
    tensorboard=2.18.0=pypi_0
    tensorboard-data-server=0.7.2=pypi_0
    tensorboardx=2.6.2.2=pypi_0
    termcolor=2.5.0=pypi_0
    threadpoolctl=3.5.0=pypi_0
    tifffile=2024.8.30=pypi_0
    timm=0.6.11=pypi_0
    tk=8.6.14=h39e8969_0
    tokenizers=0.20.0=pypi_0
    torchmetrics=1.5.1=pypi_0
    torchtriton=2.2.0=py39
    torchvideotransforms=0.1.2=dev_0
    torchvision=0.17.0=py39_cu121
    tqdm=4.66.5=pyhd8ed1ab_0
    traitlets=5.14.3=pypi_0
    transformers=4.45.1=pypi_0
    trimesh=4.4.9=pypi_0
    typed-argument-parser=1.10.1=pypi_0
    typing-extensions=4.11.0=py39h06a4308_0
    typing-inspect=0.9.0=pypi_0
    typing_extensions=4.11.0=py39h06a4308_0
    tzdata=2024.2=pypi_0
    urdfpy=0.0.22=pypi_0
    urllib3=2.2.2=py39h06a4308_0
    vc-models=0.1=dev_0
    wandb=0.18.1=pypi_0
    wcwidth=0.2.13=pypi_0
    werkzeug=3.0.4=pypi_0
    wheel=0.44.0=py39h06a4308_0
    widgetsnbextension=4.0.13=pypi_0
    xz=5.4.6=h5eee18b_1
    yaml=0.2.5=h7b6447c_0
    yarl=1.16.0=pypi_0
    zipp=3.20.2=pypi_0
    zlib=1.2.13=h5eee18b_1
    zstd=1.5.5=hc292b87_2

Conducting evaluations with debug dataset is fine. Texting Environments are dependent upon an independent configuration file in CALVIN regradless the dataset used. However, I've noticed you are not rendering with GPU acceleration:

GL_VENDOR=Mesa/X.org
GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits)
GL_VERSION=4.1 (Core Profile) Mesa 21.2.6
GL_SHADING_LANGUAGE_VERSION=4.50

In my hardware setup:

GL_VENDOR=NVIDIA Corporation
GL_RENDERER=NVIDIA GeForce RTX 3090/PCIe/SSE2
GL_VERSION=3.3.0 NVIDIA 535.161.07
GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler

If the appropriate GPU drivers are not installed or are malfunctioning, the system may revert to using LLVMpipe for rendering.

Conducting evaluations with debug dataset is fine. Texting Environments are dependent upon an independent configuration file in CALVIN regradless the dataset used. However, I've noticed you are not rendering with GPU acceleration:

GL_VENDOR=Mesa/X.org
GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits)
GL_VERSION=4.1 (Core Profile) Mesa 21.2.6
GL_SHADING_LANGUAGE_VERSION=4.50

In my hardware setup:

GL_VENDOR=NVIDIA Corporation
GL_RENDERER=NVIDIA GeForce RTX 3090/PCIe/SSE2
GL_VERSION=3.3.0 NVIDIA 535.161.07
GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler

If the appropriate GPU drivers are not installed or are malfunctioning, the system may revert to using LLVMpipe for rendering.

Thanks for your reply! After I used GPUs for rendering, the results improved. The result is shown below. Thanks for providing such a great work! However, I have one question: why didn't you train and test on other more up-to-date and challenging benchmarks?
image-20241101223601379

image-20241101223006940

CALVIN gets popular mostly after the last year's ICLR (thanks to RoboFlamingo and GR-1, I think), and it's still up-to-date in some way as you can see many papers summited to this year's ICLR using it.
It continues to present challenges as well; as a benchmark for long-horizon manipulation tasks, the success rate for completing a sequence of five consecutive sub-tasks remains low.
The 'long-horizon' characteristic is also a key reason for our selection of CALVIN, as it aligns with the what we aim to develop in CLOVER.

CALVIN gets popular mostly after the last year's ICLR (thanks to RoboFlamingo and GR-1, I think), and it's still up-to-date in some way as you can see many papers summited to this year's ICLR using it. It continues to present challenges as well; as a benchmark for long-horizon manipulation tasks, the success rate for completing a sequence of five consecutive sub-tasks remains low. The 'long-horizon' characteristic is also a key reason for our selection of CALVIN, as it aligns with the what we aim to develop in CLOVER.

Thanks for your reply! Thanks for providing such a great work!