-
visualize_dataset.py
: script from calvin developers. Trypip install opencv-python
to makeimport cv2
work. -
calvin_extract_numbers.py,jl
: extracts the numeric fields of calvin data files in current directory and prints them to stdout in tab separated format. The fields are given below. Only file-id(1), actions(7), rel_actions(7), robot_obs(15), scene_obs(24) are printed (54 columns). https://github.com/mees/calvin/blob/main/dataset/README.md has more details about the splits and the fields. We also provide description and statistics in the next section. -
debug-training.tsv
,debug-validation.tsv
, etc.: https://github.com/denizyuret/calvin-scripts/releases/tag/v0.0.1 The release page has the output of calvin_extract_numbers scripts. -
calvin_scene_info.py
: Read and print info in the scene_info.npy, ep_lens.npy, ep_start_end_ids.npy files in each calvin data directory. -
calvin_extract_unique_annotations.py
: Print all the unique task_ids and language annotations in the dataset. There are 34 unique task ids and 389 unique annotations. -
zcat D-validation.tsv.gz | python calvin_diffs.py
: prints out the differences with the previous line for all but the first line. -
zcat D-validation.tsv.gz | python calvin_episodes.py
: Tries to guess the episode boundaries if xyz of successive frames differ by more than 8.5 std. -
zcat D-validation.tsv.gz | python calvin_intervals.py
: Prints out intervals based on idnum discontinuities. -
zcat D-validation.tsv.gz | python calvin_summarize_numbers.py
: Print statistics for each column. -
python visualize_calvin_npz.py /datasets/calvin/D/validation/episode_0000000.npz
: Visualize a single frame from a given npz file. -
python calvin_visualize.py
: Visualize a series of frames in the current directory, a more detailed version of the original visualize_dataset.py. -
calvin_controller_coordinates.py
,calvin_controller_regression.py
,calvin_controller_xyz.py
,calvin_controller_check.py
: Scripts I tried to discover, output and check the controller (button etc) coordinates with. -
calvin_extract_tactile.py
: Extract the mean pixels ofdepth_tactile
(2) andrgb_tactile
images. Saved as e.g.D-validation-tactile.tsv.gz
. -
calvin_fix_tactile.py
: The initial version of extract_tactile did not normalize the values, depth pixels were too small, rgb pixels were too large. I fixed these using this script, and named the normalized files e.g.D-validation-tactile2.tsv.gz
. The extract script is also fixed now and normalizes. -
python calvin_extract_language.py auto_lang_ann.npy
: extract start, end, task, instruction triples in tab separated format -
python calvin_scene_diff.py D/validation
: extract and normalize scene coordinate differences, saved in sdiff files. -
zcat data/ABC(D)-training.tsv.gz | perl calvin_scene_fix_abc(d).pl | gzip > data/ABC(D)fixed-training.tsv.gz
: fix the red-pink and red-blue swaps in scenes A and C respectively. -
python calvin_convert_npz.py ABCD training
: create an npz file merging all features to one array and attached meta information. -
calvindataset.py
: CalvinDataset class, reads from npz file and subclasses Dataset. Deprecating loaddata.py.
- episode_XXXXXXX.npz: Each frame is represented in a file named episode_idnum.npz, consecutive idnums indicate consecutive frames (with the exception of episode transitions I guess). Other files indicating the contents:
- scene_info.npy indicates the first and last frame numbers for a particular directory as well as which scene (A,B,C,D) they come from (although there seems to be some confusion about this). It only exists in */training but describes the union of training and validation.
- ep_start_end_ids.npy indicates the start and end idnums of segments in that particular directory.
- ep_lens.npy indicates the lengths of segments given by ep_start_end_ids.npy.
- statistics.yaml gives basic stats for numeric variables.
directory | frames | scene_info.npy |
---|---|---|
debug/training | 2771 | {'calvin_scene_D': [358482, 361252]} |
debug/validation | 1675 | {'calvin_scene_D': [553567, 555241]} |
D/training | 512077 | {'calvin_scene_D': [0, 611098]} |
D/validation | 99022 | . |
ABC/training | 1795045 | {'calvin_scene_B': [0, 598909], 'calvin_scene_C': [598910, 1191338], 'calvin_scene_A': [1191339, 1795044]} |
ABC/validation | 99022 | . |
ABCD/training | 2307126 | {'calvin_scene_D': [0, 611098], 'calvin_scene_B': [611099, 1210008], 'calvin_scene_C': [1210009, 1802437], 'calvin_scene_A': [1802438, 2406143]} |
ABCD/validation | 99022 | . |
The validation directories for ABCD, ABC, and D have the same content: calvin_scene_D: 0:53818, 219635:244284, 399946:420498. (99022 frames). However note that the language annotations in the D split is different from the ones in the ABC/ABCD splits.
The training directory for D has the rest of the scene D data: calvin_scene_D: 53819:219634, 244285:399945, 420499:611098. (512077 frames). (scene_info.npy says scene_A but should be scene_D).
The training directory for ABCD starts with the same content as D/training (except for several off-by-one errors) followed by B (598910), C (592429), and A (603706) sections which are identical to ABC/training but renumbered.
The scenes differ by desk color and drawer positioning. The objects look the same.
A summary of all data (including images) in episode_XXXXXXX.npz files (each represents a single 1/30 sec frame):
# julia> for f in data[:files]; println(f, "\t", summary(get(data, f))); end
# actions 7-element Vector{Float64}
# rel_actions 7-element Vector{Float64}
# robot_obs 15-element Vector{Float64}
# scene_obs 24-element Vector{Float64}
# rgb_static 200×200×3 Array{UInt8, 3}
# rgb_gripper 84×84×3 Array{UInt8, 3}
# rgb_tactile 160×120×6 Array{UInt8, 3}
# depth_static 200×200 Matrix{Float32}
# depth_gripper 84×84 Matrix{Float32}
# depth_tactile 160×120×2 Array{Float32, 3}
The fields in the output of calvin_extract_numbers.py (files like D-training.tsv.gz) is as follows: (The npz files keep idnum separate, so column 0 is actx and in general subtract 1 from the following to get column index). (The ranges given below are from D-validation).
- idnum
- actions/x [-0.42,0.37] (tcp (tool center point) position: x,y,z in absolute world coordinates in meters: we want in the next frame, i.e. act[t] = tcp[t+1])
- actions/y [-0.46,0.10]
- actions/z [0.30,0.69]
- actions/a [-pi,pi] (tcp orientation (3): euler angles a,b,c in absolute world coordinates)
- actions/b [-0.46,0.32]
- actions/c [-pi,pi]
- actions/g [-1 or 1] (gripper_action (1): binary close=-1, open=1)
- rel_actions/x [-1,1] (tcp position (3): x,y,z in relative world coordinates normalized and clipped to (-1, 1) with scaling factor 50: rel[t]=normalize(act[t]-tcp[t]))
- rel_actions/y [-1,1] (normalize_dist=clip(act-obs,-0.02,0.02)/0.02, normalized_angle=clip(((act-obs) + pi) % (2*pi) - pi, -0.05, 0.05)/0.05
- rel_actions/z [-1,1]
- rel_actions/a [-1,1] (tcp orientation (3): euler angles a,b,c in relative world coordinates normalized and clipped to (-1, 1) with scaling factor 20)
- rel_actions/b [-1,1]
- rel_actions/c [-1,1]
- rel_actions/g [-1 or 1] (gripper_action (1): binary close=-1, open=1)
- robot_obs/x [-0.42,0.37] (tcp position (3): x,y,z in world coordinates: current position of tcp, i.e. tcp[t]=act[t-1])
- robot_obs/y [-0.46,0.10]
- robot_obs/z [0.30,0.69]
- robot_obs/a [-pi,pi] (tcp orientation (3): euler angles a,b,c in world coordinates)
- robot_obs/b [-0.46,0.32]
- robot_obs/c [-pi,pi]
- robot_obs/w [-0.02,0.08] (gripper opening width (1): in meters)
- robot_obs/j1 [-1.67,0.41] (arm_joint_states (7): in rad)
- robot_obs/j2 [-0.11,1.44]
- robot_obs/j3 [1.21,2.55]
- robot_obs/j4 [-2.80,-0.50]
- robot_obs/j5 [-1.65,0.17]
- robot_obs/j6 [1.27,2.56]
- robot_obs/j7 [-1.20,2.46]
- robot_obs/g [-1 or 1] (gripper_action (1): binary close = -1, open = 1)
- scene_obs/sliding_door [-0.002,0.28] (relative coordinates in meters)
- scene_obs/drawer [0.0,0.22] (relative coordinates in meters)
- scene_obs/button [0.0,0.034] (relative coordinates, scaled)
- scene_obs/switch [0.04,0.09] (relative coordinates, scaled)
- scene_obs/lightbulb [0 or 1] (1): on=1, off=0
- scene_obs/green light [0 or 1] (1): on=1, off=0
- scene_obs/redx (red block (6): (x, y, z, euler_x, euler_y, euler_z)
- scene_obs/redy
- scene_obs/redz
- scene_obs/reda
- scene_obs/redb
- scene_obs/redc
- scene_obs/bluex (blue block (6): (x, y, z, euler_x, euler_y, euler_z)
- scene_obs/bluey
- scene_obs/bluez
- scene_obs/bluea
- scene_obs/blueb
- scene_obs/bluec
- scene_obs/pinkx (pink block (6): (x, y, z, euler_x, euler_y, euler_z)
- scene_obs/pinky
- scene_obs/pinkz
- scene_obs/pinka
- scene_obs/pinkb
- scene_obs/pinkc
The fields in the output of calvin_controller_xyz.py (files like D-training-controllers.tsv.gz) are as follows: The coordinates give the location tcp should be at to move the controller at each point in time. For details of the coordinate calculation see calvin_controller_xyz.md and calvin_controller_xyz.py.
00.00 idnum 01.54 slider.x 02.55 slider.y 03.56 slider.z 04.57 drawer.x 05.58 drawer.y 06.59 drawer.z 07.60 button.x 08.61 button.y 09.62 button.z 10.63 switch.x 11.64 switch.y 12.65 switch.z
The fields in the output of the calvin_extract_tactile.py script (files like D-validation-tactile2.tsv.gz
). These give the average pixel of the tactile images. The depth pixels were normalized with x100, the rgb pixels were normalized with /255.0.
00.00 idnum 01.66 depth_tactile1 02.67 depth_tactile2 03.68 rgb_tactile1_r 04.69 rgb_tactile1_g 05.70 rgb_tactile1_b 06.71 rgb_tactile2_r 07.72 rgb_tactile2_g 08.73 rgb_tactile2_b
These are the normalized differences in scene coordinates: normalize(x[next]-x[curr]).
Normalization similar to rel_actions based on https://github.com/mees/calvin_env/blob/797142c588c21e76717268b7b430958dbd13bf48/calvin_env/utils/utils.py#L160
00.00 idnum 01.74 scene_obs/sliding_door 02.75 scene_obs/drawer 03.76 scene_obs/button 04.77 scene_obs/switch 05.78 scene_obs/lightbulb 06.79 scene_obs/green_light 07.80 scene_obs/redx 08.81 scene_obs/redy 09.82 scene_obs/redz 10.83 scene_obs/reda 11.84 scene_obs/redb 12.85 scene_obs/redc 13.86 scene_obs/bluex 14.87 scene_obs/bluey 15.88 scene_obs/bluez 16.89 scene_obs/bluea 17.90 scene_obs/blueb 18.91 scene_obs/bluec 19.92 scene_obs/pinkx 20.93 scene_obs/pinky 21.94 scene_obs/pinkz 22.95 scene_obs/pinka 23.96 scene_obs/pinkb 24.97 scene_obs/pinkc
Each language annotation file is class (task) balanced, roughly equal number of each class for each dataset. ABC-validation and ABCD-validation are buggy and should not be used: they padded the annotation numbers by repeating the same annotations multiple times. Use D-validation instead.
dataset | % frames annotated | frames | annots | contig tasks | frames/annot | % frames multiply annotated |
---|---|---|---|---|---|---|
ABCD-training | .3734 | 2307126 | 22966 | 14176 | 100.46 | .3650 |
ABC-training | .3760 | 1795045 | 17870 | 11102 | 100.45 | .3608 |
D-training | .3813 | 512077 | 5124 | 3251 | 99.94 | .3528 |
debug-training | .1364 | 2771 | 9 | 7 | 307.89 | .3306 |
ABCD-validation | .0634 | 99022 | 1087 | 90 | 91.10 | .9154 |
ABC-validation | .0634 | 99022 | 1087 | 90 | 91.10 | .9154 |
D-validation | .3660 | 99022 | 1011 | 605 | 97.94 | .3755 |
debug-validation | .2036 | 1675 | 8 | 6 | 209.38 | .1994 |