Data storage format
COST-97 opened this issue · 11 comments
Hello:
We want to collect more data based on this dataset.
Can you elaborate on the role of each file in the training and validation files and the tutorial on data collection?
Thank you so much!
Hi @COST-97 , I assume you had a look at this README and you know the structure of the data for each frame. If sth is unclear, don't hesitate to ask.
Apart from the frames, there is:
ep_start_end_ids.npy
: When we recorded the data with teleoperation, every continuous recording without an environment reset is an "episode". The lengths of these episodes can vary a lot (some are shorter, others might be 5 or 10 minutes of continuous recording). This is only important for the sampling of the sliding windows during training, since don't want to slide the window over those frames where the change from one episode to another happened. The file is a two-dimensional array indicating the start and end indices of the episodes in the training/validation folder.ep_lens.npy
: currently not used and not neededscene_info.npy
: A dictionary indicating which frames correspond to which of the environments. In the debug dataset, it is{'calvin_scene_D': [358482, 361252]}
, which means that all the frames were recorded in environment D. For the multi-env splits, you would see the other environments too.
The indices are the first and last frame that belongs to an environment. Note that outside of the debug dataset, this range here corresponds to multiple episodes.statistics.yaml
: This is not strictly necessary, once you created your dataset, you can runcalvin_models/calvin_agent/utils/compute_proprioception_statistics.py
to compute the mean and std for normalization during training- folder with language annotations (we used automatic labeling with
calvin_models/calvin_agent/utils/automatic_lang_annotator_mp.py
, but this could be also done manually
Are you planning to record data with teleoperation using a VR headset?
Glad to hear from you!
Your further explanation makes the data format clearer.
We are planning to collect data from other robots in simulations in the near future, and we are not considering using VR yet.
In principle, it's not too difficult to replace the robot manipulator with a different one in our simulation (it would be more straightforward, if it also had 7 DOF). Due to the different kinematics you might have to play around with the placement of the robot in the environment, to ensure all positions are reachable without leading the robot into singularities.
Recording data without VR is a bit more challenging, another alternative we have tested is using a 3D mouse. Scripting the policy would be going away from play data and it is also quite hard to script a policy for complex contact-rich manipulations.
Hello:
If I'm not mistaken, in the training phase, in all the tasks, the episode length is 64.
But intuitively there should be some differences between the different tasks, right?
How do you deal with this problem?
Thank you!
You probably refer to the language annotations, for which we use our automatic labeling tool with a sequence length of 64. That means we sample random sequences of lengths 64 and check if any task was solved in that interval, in which case that sequence gets a language label. Since most tasks need less than 64 frames to be solved, there are usually some frames at the beginning or at the end of the sequence that are not strictly task-related (for example the locomotion in the direction of the handles or switches).
During training, we sample sequences of lengths 16 to 32 (always padding to 32). For the vision part of the dataset, we use a sliding window approach over the whole play data (which can be seen as a couple of very long demonstrations).
For the language part, we still use a sliding window, but restrict it to those 64 frame sequences that were labeled with a language annotation. If you crop 16 frames from a sequence of 64 frames, it is of course possible that the completion of the task is not shown, but this is not a problem.
Hello:
The range of the rel_actions is -1~1.
How is normalization done here?
Select the maximum and minimum values in the data for normalization?
Thanks!
For the conversion from absolute metric space to relative normalized actions, the position component (x,y,z) is clipped at the interval [-0.02, 0.02] and the orientation component (euler angles) at [-0.05, 0.05]. After the clipping, we normalize to the range [-1, 1]. This happens in the calvin_env at the time of rendering.
To convert them to back metric space, the position component (x,y,z) is multiplied by 0.02 and the orientation component (euler angles) by 0.05. This happens in calvin_env here, when we do a rollout and want to control the robot with relative actions.
Hello:
What if I want to normalize all observations to -1 to 1?
Where can I see the maximum and minimum values of observations for the dataset?
Thanks!
I don't know the min and max values for the state observation, you would have to go through the dataset and check.
What we do is normalize the observation to mean 0 and std 1 with this script. We save the results in the statistics.yaml
file in the dataset training and validation folder respectively.
robot_obs:
- _target_: calvin_agent.utils.transforms.NormalizeVector
mean: [0.039233, -0.118554, 0.507826, 1.079174, -0.083069, 1.579753,
0.054622, -0.736859, 1.017769, 1.792879, -2.099604, -0.993738,
1.790842, 0.586534, 0.095367]
std: [0.150769, 0.1104 , 0.06253 , 2.883517, 0.126405, 0.377196,
0.030152, 0.334392, 0.172714, 0.240513, 0.3842 , 0.198596,
0.158712, 0.346865, 0.995442]
scene_obs:
- _target_: calvin_agent.utils.transforms.NormalizeVector
mean: [0.150934, 0.119917, 0.000239, 0.042049, 0.487755, 0.47448 ,
0.057482, -0.088074, 0.431237, 0.046034, 0.030599, 0.027333,
0.062103, -0.092833, 0.430236, -0.054962, 0.019381, 0.096546,
0.064944, -0.093058, 0.428381, 0.024941, 0.002746, -0.031589]
std: [ 0.125757, 0.09654 , 0.002148, 0.041916, 0.49985 , 0.499348,
0.146225, 0.119266, 0.050408, 1.430807, 0.676023, 2.017468,
0.142979, 0.113236, 0.049651, 1.545888, 0.3906 , 1.763569,
0.143077, 0.11546 , 0.050363, 1.514873, 0.431664, 1.860245]
If you want to normalize to -1 and 1, just modify the script I linked, but it will take some time to run through the whole dataset.