Dataset Annotation file format
Closed this issue · 12 comments
Dear authors,
Congratulations on this awesome work! This is a superb and solid work.
Thanks for releasing dropbox versions of the dataset. I have some questions regarding dataset format and annotations:
- Where can I find the object label or type for each sequence? Is it the first part in the label of the sequence? For example, if a video is labeled as
S100014_0003_0002
, is the object usedS100014
? Also, what does0003
and0002
stand for? Are they intent labels or camera views? - Where can I find the intent label? Are the intent labeled per-frame or for certain segments?
- For the OakInk-Image, I see hand annotations under
anno/hand_v
andanno/hand_j
. What coordinate system are they in? World coordinates or Camera coordinates? - When there are two hands in a video (for example, when handing over and receiving objects) do you annotate hand poses for both the hands or just a single hand?
- How are hand pose files labelled? I see two pickle files for the same frame. For example, I see
anno/hand_v/A01001__0003__0002__2021-09-26-20-02-08__0__6__1.pkl
andanno/hand_v/A01001__0003__0002__2021-09-26-20-02-08__0__6__2.pkl
. What are the differences? - Where can I find the camera extrinsics for each video?
Can you please clarify the above questions?
Also, are you planning to release a README file explaining the annotation and file format?
Thanks for your interest in our dataset,
We will update the README file to give more detailed explanation on OakInk dataset.
-
The object models used in OakInk-Image are at
$OAKINK_DIR/image/obj/
The first part of the sequence label representobject_id
.
For example, the sequence labeled asS100014_xxxx_xxxx
means that the object used in this sequence isS100014.obj
. Now I will explain the sequence label and its meaning:
Sequence label —> meanings:- sequence “A_B_C”: A means object_id, B means intent_id, and C means subject_id;
- sequence “A_B_C_D”: A means object_id, B means intent_id, and C & D means subject_id (C is the giver, D is the receiver)
-
The intent is labeled per sequence. We asked a subject to perform intent of “use” and record his/her hand during the whole course of interaction. The
intent_name
tointent_id
mappings are:ALL_INTENT = { "use": "0001", "hold": "0002", "liftup": "0003", "handover": "0004", handover = give + receive }
-
you can use our toolkit:
oikit/oi_image/oi_image.py::get_joints_3d
andget_verts_3d
to access joints/verts in camera coordinates system. -
We annotate both hands, this also explains the questions 5.
-
If a sequence appears to have two annotated hands (giver and receiver), it will have two pickle files. Each file stores one subject’s hand annotation.
-
For camera extrinsics, check:
oikit/oi_image/oi_image.py::get_mano_pose
. It describe how to use thecam_extr
The oikit provide basic usage of our dataset. You can find the code for loading data and annotations in it.
Best,
Lixin
Thank you for clarifying this.
I have a follow up question regarding OakInk-Shape annotation format. For example, I am looking at the folded: oakink_shape_v2/bottle/A16012/0cc013118e/S16105
.
- In this path, I think
A16012
correspond to the object. What about0cc013118e
andS16105
? Do they have any specific meaning? - Also, the file
oakink_shape_v2/bottle/A16012/0cc013118e/S16105/hand_param.pkl
has fieldspose
,shape
,tsl
. Thepose
has shape16x4
. Can you please explain what does this corresponding to? If I am not wrong, hand pose is in 3 dimensions right?
Thank you.
Hi,
- the
0cc013118e
is a hash code that we use to identify a certain object instance. it has no direct meaning. - the pose are represented as quaternion, thus is
16 x 4
in shape. You can easily transform them to the axis-angle by using our oikit here
Hope these help.
Lixin
Thank you Lixin!
Is my understanding of OakInk-Shape data format correct?
The root directory ./oakink_shape_v2
contains objects of different categories such as apple, banana, binoculars, bottle
, etc. For each of these object categories, there can be various instances of that object category. For example, ./oakink_shape_v2/bottle/
has various instance such as A16012, A16013, A16026,
etc. And for each type of instance, for example, ./oakink_shape_v2/bottle/A16012/
there are different possible hand grasp annotations. These are typically stored in some_hexadecimal_folder/hand_param.pkl
. Is this right?
If yes, I am a little confused by the directory structure. Sometimes there is a single hexadecimal folder such as 0eec013c90
containing hand_param.pkl
. For example, ./oakink_shape_v2/apple/C90001/0eec013c90/
has only one hand_param.pkl
. But in some cases, such as ./oakink_shape_v2/bottle/A16012/0cc013118e/
has multiple subdirectories such as S16101, S16102,
etc.
Can you please correct me if I am wrong or provide some clarifications here?
Thanks!
Hi, thank you for your interest in our work.
Yes, we store the hand grasp parameters in hand_param.pkl
.
As we introduced in our paper, we use Tink to transfer the grasp knowledge from a real-world object to a virtual counterpart object of the same category. Take bottle/A16012/0cc013118e/hand_param.pkl
as an example, the parameter here is a grasp of bottle_A16012, while bottle/A16012/0cc013118e/s16101/hand_param.pkl
stores the grasp parameter of bottle_s16101 which is transferred from the former grasp.
In short, category/real_object_id/hexadecimal/hand_param.pkl
is the original grasp of the real object, while category/real_object_id/hexadecimal/virtual_object_id/hand_param.pkl
is the transferred grasp of the virtual counterpart object from the original grasp with Tink.
In some cases, since the grasp might not be suitable for transferring or the category does not contain enough object CAD models, we only provide the grasps of real-world objects, like apple/C90001/0eec013c90
.
Also, as we have a perceptual evaluation, some hand poses have been filtered since they do not satisfy visual plausibility.
Awesome, thanks!
One followup question. For real object case ./oakink_shape_v2/binoculars/C42001/
, there are multiple hexadecimal folders. Does this corresponding to different type of hand grasps for the same object C420001
?
Yes. We invited 12 subjects to grasp the objects with different intents. Usually, a real-world object will have about ten different grasps with four types of intents: use, hold, lift-up and handover.
Another follow up question regarding image data (sorry for lot of questions)!
For sequences where the intent is handover, there are two actions give and receive, which means that there are two hands. Do you provide annotations for both the hands. For example, I am looking at annotations in ./image/anno/hand_j/
. There are two types: Y35037_0004_0007_0002__2021-10-09-14-23-47__0__107__0.pkl
and Y35037_0004_0007_0002__2021-10-09-14-23-47__1__107__0.pkl
. I assume that 0
before 107
always corresponds to the giver, where 1
always correspond to the receiver. Is this right?
Another follow up question regarding image data (sorry for lot of questions)!
For sequences where the intent is handover, there are two actions give and receive, which means that there are two hands. Do you provide annotations for both the hands. For example, I am looking at annotations in
./image/anno/hand_j/
. There are two types:Y35037_0004_0007_0002__2021-10-09-14-23-47__0__107__0.pkl
andY35037_0004_0007_0002__2021-10-09-14-23-47__1__107__0.pkl
. I assume that0
before107
always corresponds to the giver, where1
always correspond to the receiver. Is this right?
Yes, that is correct.
Dear authors,
I am trying to visualize some data based on different mode of splits or even different data splits using this.
But it looks like these splits are not updated in the code yet: OakInk-Image loader.
Can you please update this? That is, can you please provide different data splits based on subjects/ objects, and also train/ val/ test splits?
Thanks
Of course, we plan to update version_3 of the Oakink dataset around July 25th.
You only need to replace the current anno.zip file.
version_3 includes:
- subject and object split file of OakInk-Image
- a refined version of annotation, in which we fix some artifacts on the handover sequences
Dear authors,
I am trying to visualize some data based on different mode of splits or even different data splits using this.
But it looks like these splits are not updated in the code yet: OakInk-Image loader.
Can you please update this? That is, can you please provide different data splits based on subjects/ objects, and also train/ val/ test splits?
Thanks