/lerobot-upload-dataset

🤗 LeRobot: End-to-end Learning for Real-World Robotics in Pytorch

Primary LanguagePythonApache License 2.0Apache-2.0

Add a new dataset

To add a dataset to the hub, you need to login using a write-access token, which can be generated from the Hugging Face settings:

huggingface-cli login --token ${HUGGINGFACE_TOKEN} --add-to-git-credential

Also install the ffmpeg encoder with the libx264 codecs. Do not install it with apt install, try conda install -c conda-forge ffmpeg.

Then point to your raw dataset folder (e.g. data/aloha_static_pingpong_test_raw), and push your dataset to the hub with:

python lerobot/scripts/push_dataset_to_hub.py \
--raw-dir /path/to/original/dataset
--raw-format <format name>
--repo-id <use_id/user_name>
--local-dir </a/path/user_id/user_name>
--push-to-hub <0 if you dont want upload the dataset, 1 otherwise>
--force-override <0 if you want to override previous converted dataset, 1 otherwise>
--num-workers <choose a smaller number if the computer crashes>

See python lerobot/scripts/push_dataset_to_hub.py --help for more instructions.

If your dataset format is not supported, implement your own in lerobot/common/datasets/push_dataset_to_hub/${raw_format}_format.py by copying examples like pusht_zarr, umi_zarr, aloha_hdf5, or xarm_pkl. And then add the format name in push_dataset_to_hub.py here.

An example that transfers our LMDB dataset to LeRobot format

First, install lmdb with pip install lmdb.

Then download the lmdb dataset of Calvin. I use HF-Mirror to download it. You can set the environment variable export HF_ENDPOINT=https://hf-mirror.com to avoid the connection problem in some regions.

apt install git-lfs aria2 curl
wget https://hf-mirror.com/hfd/hfd.sh
chmod a+x hfd.sh
./hfd.sh StarCycle/calvin_lmdb --dataset --tool aria2c -x 9

Now move the modified lmdb_format.py and push_dataset_to_hub.py to specific locations (I have done that), and run:

python lerobot/scripts/push_dataset_to_hub.py  --raw-dir path/to/lmdb/folder --raw-format lmdb --repo-id StarCycle/test --local-dir StarCycle/test --push-to-hub 0 --force-override 1

HuggingFace does not accept file number more than 10, but calvin has more episodes than that. To satisfy their requirements, I save 100 episodes in an mp4 file.