In this project, we present the DESK (Dexterous Surgical Skill) dataset. It comprises a set of surgical robotic skills collected during a surgical training task using three robotic platforms: the Taurus II robot, Taurus II simulated robot, and the YuMi robot. This dataset was used to test the idea of transferring knowledge across different domains (e.g. from Taurus to YuMi robot) for a surgical gesture classification task with seven gestures. We explored three different scenarios: 1) No transfer, 2) Transfer from simulated Taurus to real Taurus and 3) Transfer from Simulated Taurus to the YuMi robot. We conducted extensive experiments with three supervised learning models and provided baselines in each of these scenarios. Results show that using simulation data during training enhances the performance on the real robot where limited real data is available. In particular, we obtained an accuracy of 55% on the real Taurus data using a model that is trained only on the simulator data. Furthermore, we achieved an accuracy improvement of 34% when 3% of the real data is added into the training process.
Link to the arxiv submission: paper.
Link to the dataset: data
This dataset consists of a peg transfer task performed by several participants using the following three robotic platforms: Taurus II, Taurus II simulator and YuMi robot. Each peg transfer consists of a set seven surgemes and each surgeme was marked as a success or failure. The surgemes are annotated using a annotation tool developed by us. Furthermore, we also provide the data related to the segmentation of the objects on the peg board. The detailed information regarding the robots and the statistical design of data collection can be found in the paper. The variations in this dataset include:three robots, ten subjects, various peg board positions and orientations, and direction of peg transfer.
The link to the dataset points to the following folders.
It contains 'Readme.txt' and 'data' folder which contains all of the files.
For each Subject x and task j there will be a folder in the 'data' in the format Sx.
Where x = 1...8, and j = 1...6. Each data folder Sx contains the following.
Every file follows a CSV format, where the separator is a space/comma.
There will be one for each gripper, named:
SxTj_kinematics_left.txt
SxTk_ninematics_right.txt
Important note 1. the data is recorded ONLY when the foot pedal was pressed,
meaning, only when the robot whas active or in movement. If the robot was inactive, no kinematic data is recorded.
Important note 2. The camera is oposite to the robot.
Thus the percieved camera "Left" and "Right" are inverted from the robot's left and right.
The robot's left and right are the ones that are used as a reference in the setup
These are the elements in each line or data frame:
* ut: unix time stamp
* Tip pose matrix elements (in inches and radians):
rot_00 rot_01 rot_02 tr_x
rot_10 rot_11 rot_12 tr_y
rot_20 rot_21 rot_22 tr_z
0 0 0 1
* Absolute pose of the tip in microns (with respect to the robot's origin):
x: absolute x pose of the tip
y: absolute y pose of the tip
z: absolute z pose of the tip
* pitch: pitch of the robot's tip ( Radians * 1e6)
* yaw: roll of the robot's tip ( Radians * 1e6)
( roll: roll of the robot's tip ( Radians * 1e6)
* gs: state of the gripper. A floating point that ranges from 30 (completely closed) to 100 (completely open)
Each line in this file has the following format:
ut rot_00 rot_01 rot_02 rot_10 rot_11 rot_12 rot_20 rot_21 rot_22 tr_x tr_y tr_z x y z gs
For every color video in the dataset you will find a file called SxTj_peg_segmentation.txt where you will find
all the information to draw the bounding boxes on each of the three pegs on each frame on the video. Each line
of this file have the following format:
frameNumber peg1_x peg1_y peg1_w peg1_h peg2_x peg2_y peg2_w peg2_h peg3_x peg3_y peg3_w peg3_h acc1 acc2 acc3
The frameNumber as the name implies correspond to what frame on the video is the information corresponding to,
then you will find 4 numbers for each of the 3 pegs in the video. The format of each of these sets of 4 numbers
is the following: the first pair of coordinates of the peg (pegN_x and pegN_y) corresponds to the x and y position
of the left corner of the bounding box and the second pair of coordinates of the peg (pegN_w and pegN_h) corresponds
respectively to the width and height of the bounding box.
There will be one for each subject and task named:
* SxTj_color.avi
This file will contain a color video with the entire task.
There will be a corresonding file named:
* SxTj_color_ts.txt
This file will have the unix timestamp of each frame
(so it can be traced back to the kinematic file timestamp and de depth timestamp)
There will be one for each subject and task named:
* SxTj_depth.avi
This file will contain a depth video with the entire task.
There will be a corresonding file named:
* SxTj_depth_ts.txt
This file will have the unix timestamp of each frame (so it can be traced back to the kinematic file timestamp and de depth timestamp)
It contains 'Readme.txt' and 'data' folder which contains all of the files.
For each Subject x and task j there will be a folder in the 'data' in the format Sx. Were x = 1...8, and j = 1...6. Each data folder Sx contains the following. Every file follows a CSV format, where the separator is a space.
There will be one for each gripper, named:
Sx_Tj_left_kinematics.txt
Sx_Tk_right_kinematics.txt
These are the elements in each line or data frame:
* ut: UTC
* Robot arm kinematics
* Absolute pose of the tip in inches (with respect to the robot's origin):
x: absolute x pose of the tip
y: absolute y pose of the tip
z: absolute z pose of the tip
* End effector orientation in radians:
rx: Rotation about robot x-axis
ry: Rotation about robot y-axis
rz: Rotation about robot z-axis
* Joint angles for each arm in radians: J0-J6
* gs: state of the gripper. A floating point that ranges from 30 (completely closed) to 100 (comppletely open)
Each line in this file has the following format:
ut x y z rx ry rz J0 J1 J2 J3 J4 J5 J6 gs
There will be three files for each subject and task named:
* Sx_Tj_left_color.avi : Robot's Left Stereo Camera view
* Sx_Tj_right_color.avi : Robot's Left Stereo Camera view
* Sx_Tj_oscr.avi : Oculus screen view
There will be a corresonding file named:
* Sx_Tj_color_ts.txt
This file will have the UTC timestamp of each frame
(so it can be traced back to the kinematic file timestamp and de depth timestamp)
It contains 'data' folder and 'Readme.txt'
For each Subject x and task j there will be a folder in the 'data' in the format Sx. Were x = 1...8, and j = 1...6. Each data folder Sx contains the following. Every file follows a CSV format, where the separator is a comma/space.
Important note: The percieved camera (RGB and Depth) "Left" and "Right" correspond the robot's left and right.
There will be one for each gripper, named:
Sx_Tj_yumi_left.txt
Sx_Tk_yumi_right.txt
These are the elements in each line or data frame:
* ut: unix time stamp
* Robot state (in degrees): rs_i robot_state joint i for the corresponding arm
* Absolute position ( Pose of the arm tooltip with respect to the Robot's base)
tr_x: absolute x position of the tip
tr_y: absolute y position of the tip
tr_z: absolute z position of the tip
* rotation matrix rounded to 6 digits
rot_00 rot_01 rot_02
rot_10 rot_11 rot_12
rot_20 rot_21 rot_22
* gs: state of the gripper -> boolean variable
closed: 1
open: 0
Each line in this file has the following format:
rs_1, rs_2, rs_3, rs_4, rs_5, rs_6, rs_7, tr_x, tr_y, tr_z, rot_00, rot_01, rot_02, rot_10, rot_11, rot_12, rot_20, rot_21, rot_22, gs
For every color video in the dataset you will find a file called SxTj_peg_segmentation.txt where you will find
all the information to draw the bounding boxes on each of the three pegs on each frame on the video. Each line
of this file have the following format:
frameNumber peg1_x peg1_y peg1_w peg1_h peg2_x peg2_y peg2_w peg2_h peg3_x peg3_y peg3_w peg3_h acc1 acc2 acc3
The frameNumber as the name implies correspond to what frame on the video is the information corresponding to,
then you will find 4 numbers for each of the 3 pegs in the video. The format of each one of these sets of 4 numbers
is the following: the first pair of coordinates of the peg (pegN_x and pegN_y) corresponds to the x and y position
of the upper-left corner of the bounding box and the second pair of coordinates of the peg (pegN_w and pegN_h) corresponds
respectively to the width and height of the bounding box.
There will be one for each subject and task named:
* Sx_Tj_color.avi
This file will contain a color video with the entire task.
There will be a corresonding file named:
* Sx_Tj_color_ts.txt
This file will have the unix timestamp of each frame
(so it can be traced back to the kinematic file timestamp and de depth timestamp)
There will be one for each subject and task named:
* Sx_Tj_depth.avi
This file will contain a depth video with the entire task.
There will be a corresonding file named:
* Sx_Tj_depth_ts.txt
This file will have the unix timestamp of each frame (so it can be traced back to the kinematic file timestamp and de depth timestamp)