About the evaluation on CALVIN with checkpoint provided

Question

About the evaluation on CALVIN with checkpoint provided

Closed this issue 2 months ago · 6 comments

Thanks for providing such a great work! Considering the large size of the CALVIN dataset, I conducted evaluations only on the debug dataset. Is this result reliable?

Answer 1 · 2024-10-31T06:43:04.000Z

The performance may vary given different hardware setups and environments. We've tested with multiple evaluation seeds and the performance variation should be within (+- 0.1 Avg.Len). Can you provide further details about your test bed?

Answer 2 · 2024-10-31T07:22:06.000Z

The performance may vary given different hardware setups and environments. We've tested with multiple evaluation seeds and the performance variation should be within (+- 0.1 Avg.Len). Can you provide further details about your test bed?

Sure, I conducted evaluations only on the debug dataset.
The full output is as follows.
`[2024-10-30 19:35:50,798] torch.distributed.run: [WARNING]
[2024-10-30 19:35:50,798] torch.distributed.run: [WARNING] *****************************************
[2024-10-30 19:35:50,798] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2024-10-30 19:35:50,798] torch.distributed.run: [WARNING] *****************************************
pybullet build time: Nov 28 2023 23:52:03
pybullet build time: Nov 28 2023 23:52:03
pybullet build time: Nov 28 2023 23:52:03
pybullet build time: Nov 28 2023 23:52:03
[W Utils.hpp:164] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarBool)
is_using_torchrun
[W Utils.hpp:164] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarBool)
is_using_torchrun
[W Utils.hpp:164] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarBool)
is_using_torchrun
[W Utils.hpp:164] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarBool)
is_using_torchrun
device_id: cuda:0
world_size: 4
device_id: cuda:2
world_size: 4
device_id: cuda:3
world_size: 4
device_id: cuda:1
world_size: 4
Loading checkpoint from /data1/gyh/models/CLOVER/visual_planner.pt
Loading checkpoint from /data1/gyh/models/CLOVER/visual_planner.pt
Loading checkpoint from /data1/gyh/models/CLOVER/visual_planner.pt
Loading checkpoint from /data1/gyh/models/CLOVER/visual_planner.pt
[rank0]:[W Utils.hpp:106] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarString)
[rank3]:[W Utils.hpp:106] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarString)
[rank1]:[W Utils.hpp:106] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarString)
[rank2]:[W Utils.hpp:106] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarString)
[rank3]:[W Utils.hpp:106] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarString)
[rank1]:[W Utils.hpp:106] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarString)
[rank2]:[W Utils.hpp:106] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarString)
Loading policy checkpoint from /data1/gyh/models/CLOVER/feedback_policy.pth
[rank0]:[W Utils.hpp:106] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function getCvarString)
Loading policy checkpoint from /data1/gyh/models/CLOVER/feedback_policy.pth
Loading policy checkpoint from /data1/gyh/models/CLOVER/feedback_policy.pth
Loading policy checkpoint from /data1/gyh/models/CLOVER/feedback_policy.pth
argv[0]=
startThreads creating 1 threads.
starting thread 0
started thread 0
argc=3
argv[0] = --unused
argv[1] =
argv[2] = --start_demo_name=Physics Server
ExampleBrowserThreadFunc started
X11 functions dynamically loaded using dlopen/dlsym OK!
X11 functions dynamically loaded using dlopen/dlsym OK!
argv[0]=
startThreads creating 1 threads.
starting thread 0
started thread 0
argc=3
argv[0] = --unused
argv[1] =
argv[2] = --start_demo_name=Physics Server
ExampleBrowserThreadFunc started
X11 functions dynamically loaded using dlopen/dlsym OK!
X11 functions dynamically loaded using dlopen/dlsym OK!
argv[0]=
startThreads creating 1 threads.
starting thread 0
started thread 0
argc=3
argv[0] = --unused
argv[1] =
argv[2] = --start_demo_name=Physics Server
ExampleBrowserThreadFunc started
X11 functions dynamically loaded using dlopen/dlsym OK!
argv[0]=
startThreads creating 1 threads.
starting thread 0
started thread 0
argc=3
argv[0] = --unused
argv[1] =
argv[2] = --start_demo_name=Physics Server
ExampleBrowserThreadFunc started
X11 functions dynamically loaded using dlopen/dlsym OK!
X11 functions dynamically loaded using dlopen/dlsym OK!
X11 functions dynamically loaded using dlopen/dlsym OK!
Creating context
Creating context
Created GL 3.3 context
Direct GLX rendering context obtained
Making context current
GL_VENDOR=Mesa/X.org
GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits)
GL_VERSION=4.1 (Core Profile) Mesa 21.2.6
GL_SHADING_LANGUAGE_VERSION=4.50
pthread_getconcurrency()=0
Version = 4.1 (Core Profile) Mesa 21.2.6
Vendor = Mesa/X.org
Renderer = llvmpipe (LLVM 12.0.0, 256 bits)
Created GL 3.3 context
Direct GLX rendering context obtained
Making context current
Creating context
GL_VENDOR=Mesa/X.org
GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits)
GL_VERSION=4.1 (Core Profile) Mesa 21.2.6
GL_SHADING_LANGUAGE_VERSION=4.50
pthread_getconcurrency()=0
Version = 4.1 (Core Profile) Mesa 21.2.6
Vendor = Mesa/X.org
Renderer = llvmpipe (LLVM 12.0.0, 256 bits)
Created GL 3.3 context
Creating context
Direct GLX rendering context obtained
Making context current
GL_VENDOR=Mesa/X.org
GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits)
GL_VERSION=4.1 (Core Profile) Mesa 21.2.6
GL_SHADING_LANGUAGE_VERSION=4.50
pthread_getconcurrency()=0
Version = 4.1 (Core Profile) Mesa 21.2.6
Vendor = Mesa/X.org
Renderer = llvmpipe (LLVM 12.0.0, 256 bits)
Created GL 3.3 context
Direct GLX rendering context obtained
Making context current
GL_VENDOR=Mesa/X.org
GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits)
GL_VERSION=4.1 (Core Profile) Mesa 21.2.6
GL_SHADING_LANGUAGE_VERSION=4.50
pthread_getconcurrency()=0
Version = 4.1 (Core Profile) Mesa 21.2.6
Vendor = Mesa/X.org
Renderer = llvmpipe (LLVM 12.0.0, 256 bits)
b3Printf: Selected demo: Physics Server
startThreads creating 1 threads.
starting thread 0
started thread 0
MotionThreadFunc thread started
b3Printf: Selected demo: Physics Server
startThreads creating 1 threads.
starting thread 0
started thread 0
MotionThreadFunc thread started
b3Printf: Selected demo: Physics Server
startThreads creating 1 threads.
starting thread 0
started thread 0
MotionThreadFunc thread started
b3Printf: Selected demo: Physics Server
startThreads creating 1 threads.
starting thread 0
started thread 0
MotionThreadFunc thread started
ven = Mesa/X.org
ven = Mesa/X.org
ven = Mesa/X.org
ven = Mesa/X.org
ven = Mesa/X.org
ven = Mesa/X.org
ven = Mesa/X.org
ven = Mesa/X.org
received depth0
received depth0
received depth0
received depth0
successfully load env
successfully load env
successfully wrapped model
logging to /tmp/evaluation
0%| | 0/250 [00:00<?, ?it/s]successfully wrapped model
logging to /tmp/evaluation
0%| | 0/250 [00:00<?, ?it/s]successfully load env
successfully load env
successfully wrapped model
successfully wrapped model
logging to /tmp/evaluation
0%| | 0/250 [00:00<?, ?it/s]logging to /tmp/evaluation
1/5 : 94.4% | 2/5 : 79.6% | 3/5 : 65.6% | 4/5 : 53.2% | 5/5 : 38.4% ||: 100%|████████████████████████████████████████████| 250/250 [11:39:47<00:00, 167.95s/it]
1/5 : 94.4% | 2/5 : 79.6% | 3/5 : 62.4% | 4/5 : 49.6% | 5/5 : 36.8% ||: 100%|████████████████████████████████████████████| 250/250 [12:06:51<00:00, 174.45s/it]
1/5 : 95.2% | 2/5 : 78.4% | 3/5 : 62.4% | 4/5 : 44.8% | 5/5 : 32.8% ||: 100%|████████████████████████████████████████████| 250/250 [12:07:11<00:00, 174.53s/it]
1/5 : 95.2% | 2/5 : 78.8% | 3/5 : 56.8% | 4/5 : 43.2% | 5/5 : 33.2% ||: 100%|████████████████████████████████████████████| 250/250 [12:09:16<00:00, 175.03s/it]
Results for Epoch 0:
Average successful sequence length: 3.187
Success rates for i instructions in a row:
1: 94.8%
2: 79.1%
3: 61.8%
4: 47.7%
5: 35.3%
disconnecting id 0 from server
rotate_blue_block_right: 50 / 69 | SR: 72.5%
move_slider_right: 230 / 236 | SR: 97.5%
lift_red_block_slider: 81 / 110 | SR: 73.6%
place_in_slider: 162 / 295 | SR: 54.9%
turn_off_lightbulb: 116 / 120 | SR: 96.7%
turn_off_led: 139 / 141 | SR: 98.6%
push_into_drawer: 68 / 97 | SR: 70.1%
lift_blue_block_drawer: 10 / 15 | SR: 66.7%
lift_pink_block_slider: 92 / 112 | SR: 82.1%
open_drawer: 303 / 306 | SR: 99.0%
rotate_red_block_right: 52 / 69 | SR: 75.4%
lift_pink_block_table: 139 / 157 | SR: 88.5%
push_blue_block_left: 57 / 63 | SR: 90.5%
close_drawer: 163 / 165 | SR: 98.8%
push_pink_block_right: 39 / 61 | SR: 63.9%
push_red_block_right: 39 / 66 | SR: 59.1%
push_red_block_left: 59 / 69 | SR: 85.5%
lift_blue_block_table: 147 / 162 | SR: 90.7%
rotate_blue_block_left: 47 / 62 | SR: 75.8%
place_in_drawer: 102 / 141 | SR: 72.3%
turn_on_lightbulb: 140 / 144 | SR: 97.2%
move_slider_left: 212 / 227 | SR: 93.4%
rotate_red_block_left: 51 / 55 | SR: 92.7%
turn_on_led: 146 / 151 | SR: 96.7%
stack_block: 84 / 155 | SR: 54.2%
push_pink_block_left: 56 / 71 | SR: 78.9%
lift_red_block_table: 138 / 157 | SR: 87.9%
lift_pink_block_drawer: 4 / 7 | SR: 57.1%
rotate_pink_block_right: 58 / 66 | SR: 87.9%
lift_blue_block_slider: 84 / 116 | SR: 72.4%
push_blue_block_right: 29 / 65 | SR: 44.6%
rotate_pink_block_left: 44 / 53 | SR: 83.0%
unstack_block: 37 / 40 | SR: 92.5%
lift_red_block_drawer: 9 / 11 | SR: 81.8%

disconnecting id 0 from server
Best model: epoch 0 with average sequences length of 3.187
disconnecting id 0 from server
disconnecting id 0 from server
numActiveThreads = 0
stopping threads
destroy semaphore
semaphore destroyed
Thread with taskId 0 exiting
Thread TERMINATED
destroy main semaphore
main semaphore destroyed
numActiveThreads = 0
stopping threads
Thread with taskId 0 exiting
Thread TERMINATED
destroy semaphore
semaphore destroyed
destroy main semaphore
main semaphore destroyed
numActiveThreads = 0
stopping threads
destroy semaphore
semaphore destroyed
Thread with taskId 0 exiting
Thread TERMINATED
destroy main semaphore
main semaphore destroyed
numActiveThreads = 0
stopping threads
destroy semaphore
semaphore destroyed
Thread with taskId 0 exiting
Thread TERMINATED
destroy main semaphore
main semaphore destroyed
finished
numActiveThreads = 0
btShutDownExampleBrowser stopping threads
Thread with taskId 0 exiting
Thread TERMINATED
destroy semaphore
semaphore destroyed
destroy main semaphore
main semaphore destroyed
finished
numActiveThreads = 0
btShutDownExampleBrowser stopping threads
destroy semaphore
semaphore destroyed
Thread with taskId 0 exiting
Thread TERMINATED
destroy main semaphore
main semaphore destroyed
finished
numActiveThreads = 0
btShutDownExampleBrowser stopping threads
Thread with taskId 0 exiting
Thread TERMINATED
destroy semaphore
semaphore destroyed
destroy main semaphore
main semaphore destroyed
finished
numActiveThreads = 0
btShutDownExampleBrowser stopping threads
destroy semaphore
semaphore destroyed
Thread with taskId 0 exiting
Thread TERMINATED
destroy main semaphore
main semaphore destroyed`

Environment:

ubuntu: 20.04.3 LTS
Pytorch 2.2
4 A6000
conda list:
_libgcc_mutex=0.1=conda_forge
_openmp_mutex=4.5=2_gnu
absl-py=2.1.0=pypi_0
accelerate=1.0.1=pypi_0
addict=2.4.0=pypi_0
aiohappyeyeballs=2.4.3=pypi_0
aiohttp=3.10.10=pypi_0
aiosignal=1.3.1=pypi_0
antlr4-python3-runtime=4.8=pypi_0
asttokens=2.4.1=pypi_0
async-timeout=4.0.3=pypi_0
attrs=24.2.0=pypi_0
beautifulsoup4=4.12.3=pypi_0
blas=1.0=mkl
blinker=1.8.2=pypi_0
blosc=1.11.2=pypi_0
brotli-python=1.0.9=py39h6a678d5_8
bzip2=1.0.8=h5eee18b_6
ca-certificates=2024.8.30=hbcca054_0
calvin=0.0.1=dev_0
calvin-env=0.0.1=pypi_0
certifi=2024.8.30=pyhd8ed1ab_0
cffi=1.14.2=pypi_0
charset-normalizer=3.3.2=pyhd3eb1b0_0
click=8.1.7=pypi_0
clip=1.0=pypi_0
cloudpickle=3.1.0=pypi_0
cmake=3.18.4=pypi_0
colorama=0.4.6=pyhd8ed1ab_0
colorlog=6.8.2=pypi_0
comm=0.2.2=pypi_0
configargparse=1.7=pypi_0
contourpy=1.3.0=pypi_0
cuda-cudart=12.1.105=0
cuda-cupti=12.1.105=0
cuda-libraries=12.1.0=0
cuda-nvrtc=12.1.105=0
cuda-nvtx=12.1.105=0
cuda-opencl=12.6.68=0
cuda-runtime=12.1.0=0
cuda-version=12.6=3
cycler=0.12.1=pypi_0
dash=2.18.1=pypi_0
dash-core-components=2.0.0=pypi_0
dash-html-components=2.0.0=pypi_0
dash-table=5.0.0=pypi_0
decorator=4.4.2=pypi_0
diffusers=0.30.3=pypi_0
docker-pycreds=0.4.0=pypi_0
docstring-parser=0.16=pypi_0
einops=0.8.0=pypi_0
einops-exts=0.0.4=pypi_0
ema-pytorch=0.7.3=pypi_0
exceptiongroup=1.2.2=pypi_0
executing=2.1.0=pypi_0
fastjsonschema=2.20.0=pypi_0
ffmpeg=4.3=hf484d3e_0
filelock=3.13.1=py39h06a4308_0
flask=3.0.3=pypi_0
fonttools=4.54.1=pypi_0
freetype=2.12.1=h4a9f257_0
freetype-py=2.5.1=pypi_0
frozenlist=1.4.1=pypi_0
fsspec=2024.9.0=pypi_0
ftfy=6.2.3=pypi_0
gdown=5.2.0=pypi_0
gitdb=4.0.11=pypi_0
gitpython=3.1.43=pypi_0
gmp=6.2.1=h295c915_3
gmpy2=2.1.2=py39heeb90bb_0
gnutls=3.6.15=he1e5248_0
grpcio=1.66.1=pypi_0
gym=0.26.2=pypi_0
gym-notices=0.0.8=pypi_0
h5py=3.12.1=pypi_0
html-testrunner=1.2.1=pypi_0
huggingface-hub=0.25.1=pypi_0
hydra-colorlog=1.2.0=pypi_0
hydra-core=1.1.1=pypi_0
idna=3.7=py39h06a4308_0
imageio=2.36.0=pypi_0
imageio-ffmpeg=0.5.1=pypi_0
importlib-metadata=8.5.0=pypi_0
importlib-resources=6.4.5=pypi_0
intel-openmp=2023.1.0=hdb19cb5_46306
ipython=8.18.1=pypi_0
ipywidgets=8.1.5=pypi_0
itsdangerous=2.2.0=pypi_0
jedi=0.19.1=pypi_0
jinja2=3.1.4=py39h06a4308_0
joblib=1.4.2=pypi_0
jpeg=9e=h5eee18b_3
jsonschema=4.23.0=pypi_0
jsonschema-specifications=2023.12.1=pypi_0
jupyter-core=5.7.2=pypi_0
jupyterlab-widgets=3.0.13=pypi_0
kiwisolver=1.4.7=pypi_0
lame=3.100=h7b6447c_0
lazy-loader=0.4=pypi_0
lcms2=2.12=h3be6417_0
ld_impl_linux-64=2.40=h12ee557_0
lerc=3.0=h295c915_0
libcublas=12.1.0.26=0
libcufft=11.0.2.4=0
libcufile=1.11.1.6=0
libcurand=10.3.7.68=0
libcusolver=11.4.4.55=0
libcusparse=12.0.2.55=0
libdeflate=1.17=h5eee18b_1
libffi=3.4.4=h6a678d5_1
libgcc=14.2.0=h77fa898_1
libgcc-ng=14.2.0=h69a702a_1
libgomp=14.2.0=h77fa898_1
libiconv=1.16=h5eee18b_3
libidn2=2.3.4=h5eee18b_0
libjpeg-turbo=2.0.0=h9bf148f_0
libnpp=12.0.2.50=0
libnvjitlink=12.1.105=0
libnvjpeg=12.1.1.14=0
libpng=1.6.39=h5eee18b_0
libstdcxx-ng=11.2.0=h1234567_1
libtasn1=4.19.0=h5eee18b_0
libtiff=4.5.1=h6a678d5_0
libunistring=0.9.10=h27cfd23_0
libwebp-base=1.3.2=h5eee18b_0
lightning-lite=1.8.6=pypi_0
lightning-utilities=0.11.8=pyhd8ed1ab_0
llvm-openmp=14.0.6=h9e868ea_0
llvmlite=0.43.0=pypi_0
lxml=5.3.0=pypi_0
lz4-c=1.9.4=h6a678d5_1
markdown=3.7=pypi_0
markdown-it-py=3.0.0=pypi_0
markupsafe=2.1.3=py39h5eee18b_0
matplotlib=3.9.2=pypi_0
matplotlib-inline=0.1.7=pypi_0
mdurl=0.1.2=pypi_0
mkl=2023.1.0=h213fc3f_46344
mkl-service=2.4.0=py39h5eee18b_1
mkl_fft=1.3.10=py39h5eee18b_0
mkl_random=1.2.7=py39h1128e8f_0
moviepy=1.0.3=pypi_0
mpc=1.1.0=h10f8cd9_1
mpfr=4.0.2=hb69a4c5_1
mpmath=1.3.0=py39h06a4308_0
multicoretsne=0.1=pypi_0
multidict=6.1.0=pypi_0
mypy-extensions=1.0.0=pypi_0
natsort=8.4.0=pypi_0
nbformat=5.10.4=pypi_0
ncurses=6.4=h6a678d5_0
nest-asyncio=1.6.0=pypi_0
nettle=3.7.3=hbbd107a_1
networkx=3.2.1=pypi_0
numba=0.60.0=pypi_0
numpy=1.23.0=pypi_0
numpy-quaternion=2023.0.4=pypi_0
nvidia-ml-py=12.560.30=pypi_0
omegaconf=2.1.2=pypi_0
open-clip-torch=2.29.0=pypi_0
open3d=0.18.0=pypi_0
opencv-python=4.10.0.84=pypi_0
openh264=2.1.1=h4ff587b_0
openjpeg=2.5.2=he7f1fd0_0
openssl=3.3.2=hb9d3cd8_0
packaging=24.1=pyhd8ed1ab_0
pandas=2.2.3=pypi_0
parso=0.8.4=pypi_0
pexpect=4.9.0=pypi_0
pillow=10.4.0=py39h5eee18b_0
pip=24.2=py39h06a4308_0
platformdirs=4.3.6=pypi_0
plotly=5.24.1=pypi_0
proglog=0.1.10=pypi_0
prompt-toolkit=3.0.48=pypi_0
propcache=0.2.0=pypi_0
protobuf=5.28.2=pypi_0
psutil=6.0.0=pypi_0
ptyprocess=0.7.0=pypi_0
pure-eval=0.2.3=pypi_0
pybullet=3.2.6=pypi_0
pycollada=0.6=pypi_0
pycparser=2.22=pypi_0
pyglet=2.0.18=pypi_0
pygments=2.18.0=pypi_0
pyhash=0.9.3=pypi_0
pyopengl=3.1.0=pypi_0
pyparsing=3.1.4=pypi_0
pyquaternion=0.9.9=pypi_0
pyrender=0.1.45=pypi_0
pyrep=4.1.0.2=dev_0
pysocks=1.7.1=py39h06a4308_0
python=3.9.19=h955ad1f_1
python-dateutil=2.9.0.post0=pypi_0
pytorch=2.2.0=py3.9_cuda12.1_cudnn8.9.2_0
pytorch-cuda=12.1=ha16c6d3_5
pytorch-fid=0.3.0=pypi_0
pytorch-lightning=1.8.6=pypi_0
pytorch-mutex=1.0=cuda
pytz=2024.2=pypi_0
pyyaml=6.0.1=py39h5eee18b_0
readline=8.2=h5eee18b_0
referencing=0.35.1=pypi_0
regex=2024.9.11=pypi_0
requests=2.32.3=py39h06a4308_0
retrying=1.3.4=pypi_0
rich=13.9.3=pypi_0
rlbench=1.2.0=dev_0
rotary-embedding-torch=0.8.4=pypi_0
rpds-py=0.20.0=pypi_0
safetensors=0.4.5=pypi_0
scikit-image=0.24.0=pypi_0
scikit-learn=1.5.2=pypi_0
scikit-video=1.1.11=pypi_0
scipy=1.13.1=pypi_0
sentence-transformers=3.2.1=pypi_0
sentry-sdk=2.14.0=pypi_0
setproctitle=1.3.3=pypi_0
setuptools=57.5.0=pypi_0
six=1.16.0=pypi_0
smmap=5.0.1=pypi_0
soupsieve=2.6=pypi_0
sqlite=3.45.3=h5eee18b_0
stack-data=0.6.3=pypi_0
sympy=1.13.2=py39h06a4308_0
tacto=0.0.3=dev_0
tbb=2021.8.0=hdb19cb5_0
tenacity=9.0.0=pypi_0
tensorboard=2.18.0=pypi_0
tensorboard-data-server=0.7.2=pypi_0
tensorboardx=2.6.2.2=pypi_0
termcolor=2.5.0=pypi_0
threadpoolctl=3.5.0=pypi_0
tifffile=2024.8.30=pypi_0
timm=0.6.11=pypi_0
tk=8.6.14=h39e8969_0
tokenizers=0.20.0=pypi_0
torchmetrics=1.5.1=pypi_0
torchtriton=2.2.0=py39
torchvideotransforms=0.1.2=dev_0
torchvision=0.17.0=py39_cu121
tqdm=4.66.5=pyhd8ed1ab_0
traitlets=5.14.3=pypi_0
transformers=4.45.1=pypi_0
trimesh=4.4.9=pypi_0
typed-argument-parser=1.10.1=pypi_0
typing-extensions=4.11.0=py39h06a4308_0
typing-inspect=0.9.0=pypi_0
typing_extensions=4.11.0=py39h06a4308_0
tzdata=2024.2=pypi_0
urdfpy=0.0.22=pypi_0
urllib3=2.2.2=py39h06a4308_0
vc-models=0.1=dev_0
wandb=0.18.1=pypi_0
wcwidth=0.2.13=pypi_0
werkzeug=3.0.4=pypi_0
wheel=0.44.0=py39h06a4308_0
widgetsnbextension=4.0.13=pypi_0
xz=5.4.6=h5eee18b_1
yaml=0.2.5=h7b6447c_0
yarl=1.16.0=pypi_0
zipp=3.20.2=pypi_0
zlib=1.2.13=h5eee18b_1
zstd=1.5.5=hc292b87_2

Answer 3 · 2024-11-01T04:44:16.000Z

Conducting evaluations with debug dataset is fine. Texting Environments are dependent upon an independent configuration file in CALVIN regradless the dataset used. However, I've noticed you are not rendering with GPU acceleration:

GL_VENDOR=Mesa/X.org
GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits)
GL_VERSION=4.1 (Core Profile) Mesa 21.2.6
GL_SHADING_LANGUAGE_VERSION=4.50

In my hardware setup:

GL_VENDOR=NVIDIA Corporation
GL_RENDERER=NVIDIA GeForce RTX 3090/PCIe/SSE2
GL_VERSION=3.3.0 NVIDIA 535.161.07
GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler

If the appropriate GPU drivers are not installed or are malfunctioning, the system may revert to using LLVMpipe for rendering.

Answer 4 · 2024-11-01T14:43:58.000Z

Conducting evaluations with debug dataset is fine. Texting Environments are dependent upon an independent configuration file in CALVIN regradless the dataset used. However, I've noticed you are not rendering with GPU acceleration:
GL_VENDOR=Mesa/X.org
GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits)
GL_VERSION=4.1 (Core Profile) Mesa 21.2.6
GL_SHADING_LANGUAGE_VERSION=4.50
In my hardware setup:
GL_VENDOR=NVIDIA Corporation
GL_RENDERER=NVIDIA GeForce RTX 3090/PCIe/SSE2
GL_VERSION=3.3.0 NVIDIA 535.161.07
GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler
If the appropriate GPU drivers are not installed or are malfunctioning, the system may revert to using LLVMpipe for rendering.

Thanks for your reply! After I used GPUs for rendering, the results improved. The result is shown below. Thanks for providing such a great work! However, I have one question: why didn't you train and test on other more up-to-date and challenging benchmarks?

Answer 5 · 2024-11-15T13:29:24.000Z

CALVIN gets popular mostly after the last year's ICLR (thanks to RoboFlamingo and GR-1, I think), and it's still up-to-date in some way as you can see many papers summited to this year's ICLR using it.
It continues to present challenges as well; as a benchmark for long-horizon manipulation tasks, the success rate for completing a sequence of five consecutive sub-tasks remains low.
The 'long-horizon' characteristic is also a key reason for our selection of CALVIN, as it aligns with the what we aim to develop in CLOVER.

Answer 6 · 2024-11-16T05:15:21.000Z

CALVIN gets popular mostly after the last year's ICLR (thanks to RoboFlamingo and GR-1, I think), and it's still up-to-date in some way as you can see many papers summited to this year's ICLR using it. It continues to present challenges as well; as a benchmark for long-horizon manipulation tasks, the success rate for completing a sequence of five consecutive sub-tasks remains low. The 'long-horizon' characteristic is also a key reason for our selection of CALVIN, as it aligns with the what we aim to develop in CLOVER.

Thanks for your reply! Thanks for providing such a great work!