To add a dataset to the hub, you need to login using a write-access token, which can be generated from the Hugging Face settings:
huggingface-cli login --token ${HUGGINGFACE_TOKEN} --add-to-git-credential
Also install the ffmpeg encoder with the libx264
codecs. Do not install it with apt install
, try conda install -c conda-forge ffmpeg
.
Then point to your raw dataset folder (e.g. data/aloha_static_pingpong_test_raw
), and push your dataset to the hub with:
python lerobot/scripts/push_dataset_to_hub.py \
--raw-dir /path/to/original/dataset
--raw-format <format name>
--repo-id <use_id/user_name>
--local-dir </a/path/user_id/user_name>
--push-to-hub <0 if you dont want upload the dataset, 1 otherwise>
--force-override <0 if you want to override previous converted dataset, 1 otherwise>
--num-workers <choose a smaller number if the computer crashes>
See python lerobot/scripts/push_dataset_to_hub.py --help
for more instructions.
If your dataset format is not supported, implement your own in lerobot/common/datasets/push_dataset_to_hub/${raw_format}_format.py
by copying examples like pusht_zarr, umi_zarr, aloha_hdf5, or xarm_pkl. And then add the format name in push_dataset_to_hub.py
here.
First, install lmdb with pip install lmdb
.
Then download the lmdb dataset of Calvin. I use HF-Mirror to download it. You can set the environment variable export HF_ENDPOINT=https://hf-mirror.com
to avoid the connection problem in some regions.
apt install git-lfs aria2 curl
wget https://hf-mirror.com/hfd/hfd.sh
chmod a+x hfd.sh
./hfd.sh StarCycle/calvin_lmdb --dataset --tool aria2c -x 9
Now move the modified lmdb_format.py
and push_dataset_to_hub.py
to specific locations (I have done that), and run:
python lerobot/scripts/push_dataset_to_hub.py --raw-dir path/to/lmdb/folder --raw-format lmdb --repo-id StarCycle/test --local-dir StarCycle/test --push-to-hub 0 --force-override 1
HuggingFace does not accept file number more than 10, but calvin has more episodes than that. To satisfy their requirements, I save 100 episodes in an mp4 file.