oakink/OakInk

Dataset Annotation file format

Closed this issue · 12 comments

Dear authors,

Congratulations on this awesome work! This is a superb and solid work.

Thanks for releasing dropbox versions of the dataset. I have some questions regarding dataset format and annotations:

  1. Where can I find the object label or type for each sequence? Is it the first part in the label of the sequence? For example, if a video is labeled as S100014_0003_0002, is the object used S100014? Also, what does 0003 and 0002 stand for? Are they intent labels or camera views?
  2. Where can I find the intent label? Are the intent labeled per-frame or for certain segments?
  3. For the OakInk-Image, I see hand annotations under anno/hand_v and anno/hand_j. What coordinate system are they in? World coordinates or Camera coordinates?
  4. When there are two hands in a video (for example, when handing over and receiving objects) do you annotate hand poses for both the hands or just a single hand?
  5. How are hand pose files labelled? I see two pickle files for the same frame. For example, I see anno/hand_v/A01001__0003__0002__2021-09-26-20-02-08__0__6__1.pkl and anno/hand_v/A01001__0003__0002__2021-09-26-20-02-08__0__6__2.pkl. What are the differences?
  6. Where can I find the camera extrinsics for each video?

Can you please clarify the above questions?

Also, are you planning to release a README file explaining the annotation and file format?

Thanks for your interest in our dataset,
We will update the README file to give more detailed explanation on OakInk dataset.

  1. The object models used in OakInk-Image are at $OAKINK_DIR/image/obj/
    The first part of the sequence label represent object_id.
    For example, the sequence labeled as S100014_xxxx_xxxx means that the object used in this sequence is S100014.obj. Now I will explain the sequence label and its meaning:
    Sequence label —> meanings:

    • sequence “A_B_C”: A means object_id, B means intent_id, and C means subject_id;
    • sequence “A_B_C_D”: A means object_id, B means intent_id, and C & D means subject_id (C is the giver, D is the receiver)
  2. The intent is labeled per sequence. We asked a subject to perform intent of “use” and record his/her hand during the whole course of interaction. The intent_name to intent_id mappings are:

    ALL_INTENT = {
        "use": "0001",
        "hold": "0002",
        "liftup": "0003",
        "handover": "0004", handover = give + receive
    }
    
  3. you can use our toolkit: oikit/oi_image/oi_image.py::get_joints_3d and get_verts_3d
    to access joints/verts in camera coordinates system.

  4. We annotate both hands, this also explains the questions 5.

  5. If a sequence appears to have two annotated hands (giver and receiver), it will have two pickle files. Each file stores one subject’s hand annotation.

  6. For camera extrinsics, check: oikit/oi_image/oi_image.py::get_mano_pose. It describe how to use the cam_extr

The oikit provide basic usage of our dataset. You can find the code for loading data and annotations in it.

Best,
Lixin

Thank you for clarifying this.

I have a follow up question regarding OakInk-Shape annotation format. For example, I am looking at the folded: oakink_shape_v2/bottle/A16012/0cc013118e/S16105.

  1. In this path, I think A16012 correspond to the object. What about 0cc013118e and S16105? Do they have any specific meaning?
  2. Also, the file oakink_shape_v2/bottle/A16012/0cc013118e/S16105/hand_param.pkl has fields pose, shape, tsl. The pose has shape 16x4. Can you please explain what does this corresponding to? If I am not wrong, hand pose is in 3 dimensions right?

Thank you.

Hi,

  1. the 0cc013118e is a hash code that we use to identify a certain object instance. it has no direct meaning.
  2. the pose are represented as quaternion, thus is 16 x 4 in shape. You can easily transform them to the axis-angle by using our oikit here

Hope these help.
Lixin

Thank you Lixin!

Is my understanding of OakInk-Shape data format correct?

The root directory ./oakink_shape_v2 contains objects of different categories such as apple, banana, binoculars, bottle, etc. For each of these object categories, there can be various instances of that object category. For example, ./oakink_shape_v2/bottle/ has various instance such as A16012, A16013, A16026, etc. And for each type of instance, for example, ./oakink_shape_v2/bottle/A16012/ there are different possible hand grasp annotations. These are typically stored in some_hexadecimal_folder/hand_param.pkl. Is this right?

If yes, I am a little confused by the directory structure. Sometimes there is a single hexadecimal folder such as 0eec013c90 containing hand_param.pkl . For example, ./oakink_shape_v2/apple/C90001/0eec013c90/ has only one hand_param.pkl . But in some cases, such as ./oakink_shape_v2/bottle/A16012/0cc013118e/ has multiple subdirectories such as S16101, S16102, etc.

Can you please correct me if I am wrong or provide some clarifications here?

Thanks!

Hi, thank you for your interest in our work.

Yes, we store the hand grasp parameters in hand_param.pkl.
As we introduced in our paper, we use Tink to transfer the grasp knowledge from a real-world object to a virtual counterpart object of the same category. Take bottle/A16012/0cc013118e/hand_param.pkl as an example, the parameter here is a grasp of bottle_A16012, while bottle/A16012/0cc013118e/s16101/hand_param.pkl stores the grasp parameter of bottle_s16101 which is transferred from the former grasp.
In short, category/real_object_id/hexadecimal/hand_param.pkl is the original grasp of the real object, while category/real_object_id/hexadecimal/virtual_object_id/hand_param.pkl is the transferred grasp of the virtual counterpart object from the original grasp with Tink.

In some cases, since the grasp might not be suitable for transferring or the category does not contain enough object CAD models, we only provide the grasps of real-world objects, like apple/C90001/0eec013c90.

Also, as we have a perceptual evaluation, some hand poses have been filtered since they do not satisfy visual plausibility.

Awesome, thanks!

One followup question. For real object case ./oakink_shape_v2/binoculars/C42001/, there are multiple hexadecimal folders. Does this corresponding to different type of hand grasps for the same object C420001 ?

Yes. We invited 12 subjects to grasp the objects with different intents. Usually, a real-world object will have about ten different grasps with four types of intents: use, hold, lift-up and handover.

Another follow up question regarding image data (sorry for lot of questions)!

For sequences where the intent is handover, there are two actions give and receive, which means that there are two hands. Do you provide annotations for both the hands. For example, I am looking at annotations in ./image/anno/hand_j/ . There are two types: Y35037_0004_0007_0002__2021-10-09-14-23-47__0__107__0.pkl and Y35037_0004_0007_0002__2021-10-09-14-23-47__1__107__0.pkl. I assume that 0 before 107 always corresponds to the giver, where 1 always correspond to the receiver. Is this right?

Another follow up question regarding image data (sorry for lot of questions)!

For sequences where the intent is handover, there are two actions give and receive, which means that there are two hands. Do you provide annotations for both the hands. For example, I am looking at annotations in ./image/anno/hand_j/ . There are two types: Y35037_0004_0007_0002__2021-10-09-14-23-47__0__107__0.pkl and Y35037_0004_0007_0002__2021-10-09-14-23-47__1__107__0.pkl. I assume that 0 before 107 always corresponds to the giver, where 1 always correspond to the receiver. Is this right?

Yes, that is correct.

Dear authors,

I am trying to visualize some data based on different mode of splits or even different data splits using this.

But it looks like these splits are not updated in the code yet: OakInk-Image loader.

Can you please update this? That is, can you please provide different data splits based on subjects/ objects, and also train/ val/ test splits?

Thanks

Of course, we plan to update version_3 of the Oakink dataset around July 25th.
You only need to replace the current anno.zip file.

version_3 includes:

  • subject and object split file of OakInk-Image
  • a refined version of annotation, in which we fix some artifacts on the handover sequences

Dear authors,

I am trying to visualize some data based on different mode of splits or even different data splits using this.

But it looks like these splits are not updated in the code yet: OakInk-Image loader.

Can you please update this? That is, can you please provide different data splits based on subjects/ objects, and also train/ val/ test splits?

Thanks