bhaswara/list-of-surgical-tool-datasets

List of surgical tool datasets organised by task.

MIT

Description

List of surgical tool datasets organised by task. A list of data repositories is also displayed at the bottom. Please open an issue if you see a relevant open dataset which is missing or if you find inacurate information.

Minimally invasive surgery

Tool classification

Dataset	Brief description	Images	Procedures	Paper
Cholec80	80 videos of cholecystectomy surgeries performed by 13 surgeons. The videos are captured at 25 fps. The dataset is labeled with the phase (at 25 fps) and tool presence annotations (at 1 fps). A tool is defind as present in an image if at least half of the tool tip is visible.	86K	80	Twinanda et al. 2016
CATARACTS	This dataset consists of 50 cataract surgery. It was annotated for two main tasks: surgical tool presence detection and surgical activity recognition. It was divided into two sets (train, test) for the surgical tool presence detection task and 3 sets (train, dev, test) for the activity recognition task.	900K	50	Al Hajj et al. 2019

Tool segmentation

Dataset	Brief description	Images	Procedures	Paper
RMIT	This dataset consists of three image sequences during retinal microsurgery. For each image sequence, the instrument position and size has been hand annotated.	1.5K	4	Sznitman et a. 2012
InstrumentCrowd	The training data was generated from a total of 6 surgical procedures, three from laparoscopic adrenalectomies and three from laparoscopic pancreatic resections. From each surgery, 20 images containing one or several medical instruments were extracted, yielding 120 images in total.	120	6	Maier-Hein et al. 2014
NeuroSurgicalTools	Consists of 2476 monocular images (1221 for training and 1255 for testing) coming from in vivo neurosurgeries. The resolution of the images varies from 612×460 to 1920×1080.	2.5K	14	Bouget et al. 2015
EndoVis2015	40 2D in-vivo images from 4 laparoscopic colorectal surgeries. Each pixel is labelled as either background, shaft and manipulator (~160 2D images and annotations in total). 4x 45-second 2D images sequences of at least one Large Needle Driver instrument in an ex-vivo setup. Each pixel is labelled as either backgroud, shaft, head or clasper.	9K	8	N/A
EndoVis2017	8x 225-frame robotic surgical videos, captured at 2 Hz, with manually labelled different tool parts and types. The testing set contains 8x 75-frame videos and 2x 300-frame videos.	1.8K	8	Allan et al. 2019
EndoVis2018	Training dataset is made up of 16 robotic nephrectomy procedures recorded using da Vinci Xi systems in porcine labs (subsampled to 2fps). Sequences with little or no motion are manually removed to leave 149 frames per procedure. Video frames are 1280x1024 and we provide the left and right eye camera image as well as the stereo camera calibration parameters. Labels are only provided for the left image.	2.4K	16	Allan et al. 2020
ROBUST-MIS2019	Procedures in rectal resection and proctocolectomy. A training case encompasses a 10 second video snippet in form of 250 endoscopic image frames and a reference annotation for the last frame. In the annotated frame a “0” indicates the absence of a medical instrument and numbers “1”, “2“, ... represent different instances of medical instruments.	10K	30	Ross et al. 2020
Kvasir-Instrument	The Kvasir-Instrument dataset consists of consists of 590 annotated frames comprising of GI procedure tools such as snares, balloons, biopsy forceps, etc. The resolution of the image in the dataset varies from 720x576 to 1280x1024.	590	N/A	Jha et al. 2020
CholecSeg8k	This dataset contains 8080 laparoscopic cholecystectomy image frames extracted and annotated from 17 video clips in Cholec80.	8K	17	Hong et al. 2020
RoboTool	514 images extracted from the videos of 20 freely available robotic surgical procedures and annotated for binary tool-background segmentation.	514	20	Garcia-Peraza-Herrera et al. 2021

Tool-tissue action detection

Dataset	Brief description	Images	Procedures	Paper
CholecT50	Every frame is annotated with labels from the triplet: instrument, verb and target.	N/A	50	Nwoye et al. 2022
SARAS-MESAD2021	Dataset contains monocular digital recordings from da Vinci Xi robotic system. Two sub-datasets: MESAD-Real and MESAD-Phantom. MESAD-Real represents the prostatectomy procedures recorded on human patients. It contains four sessions of complete prostatectomy procedure performed by expert surgeons on real patients. MESAD-Phantom is also designed for surgeon action detection during prostatectomy, but is composed of videos captured during procedures on phantoms used for the training of surgeons. MESAD-Real comprises 21 action classes and MESAD-Phantom contemplates a smaller list of 14 action classes. Both the datasets share 11 action classes.	N/A	4	N/A
PSI-AVA	PSI-AVA is a dataset designed for holistic surgical scene understanding. It contains approximately 20.45 hours of the surgical procedure performed by three expert surgeons and annotations for both long-term (Phase and Step recognition) and short-term reasoning (Instrument detection and novel Atomic Action recognition) in robot-assisted radical prostatectomy videos.	N/A	8	Valderrama et al. 2022

Skill assessment and workflow recognition

Dataset	Brief description	Images	Procedures	Paper
JIGSAWS	The JIGSAWS dataset consists of three components: kinematic data (Cartesian positions, orientations, velocities, angular velocities and gripper angle describing the motion of the manipulators), video data (stereo video captured from the endoscopic camera), and manual annotations of gestures (atomic surgical activity segment labels) and skill (global rating score using modified objective structured assessments of technical skills).	N/A	N/A	Gao et al. 2014
Cataract-101	This dataset contains 101 videos of cataract surgeries annotated with two kinds of information: Anonymous ID and experience level of operating surgeon, and starting points of quasi-standardized operation phases in videos.	1.3M	101	Schoeffmann et al. 2018
HeiCo	The data set contains of data from the ROBUST-MIS 2019 challenge and the Surgical Workflow Challenges from EndoVis 2017 and 2018.	10K	30	Maier-Hein et al. 2020
MISAW	The data-set contains 27 micro-anastomosis training sequences and is composed of the following information: stereoscopic video, kinematic data, workflow annotation at 3 levels of granularity (phases, steps, and activities).	N/A	27	Huaulmé et al. 2021
PETRAW	Dataset for online automatic recognition of surgical workflow by using both kinematic and stereoscopic video information on a micro-anastomosis training task.	N/A	100	N/A

Image-to-image translation

Dataset	Brief description	Paper
Laparoscopic Image to Image Translation	Synthetic images in a 3D environemnt roughly resembling laparoscopic liver surgery scenes. A group of Generative Adversarial Networks (GAN) is trained to translate these images to look like real laparoscopic images. After the training process, the translated images along with their labels can be used as training data for a certain target task.	Pfeiffer et al. 2019

Multi-task datasets

Dataset	Brief description	Images/Videos	Procedures	Paper
ART-Net	This dataset consists non-robotic tools with annotated tool presence, tool segmentation, and instrumnt geometric primitives (mid-line, edge-line, tooltip). The images come from laparoscopic hysterectomy videos. This dataset also contains tool presence annotated for another set of 3000 images, namely 1500 positive and 1500 negative images, respectively, for which some positive images contain multiple tools. 4270 images are labelled for tool detection. If the tool shaft is not visible at all, the image is marked as negative. When a small part of the tool shaft is visible, the image is marked as positive. For segmentation and geometric primitive extraction, 635 images are annotated.	Different for each task	29	Hasan al. 2021
HeiSurF	Surgical Workflow Analysis and Full Scene Segmentation. All surgeries were annotated framewise for surgical phases by surgical experts. Surgical actions, instrument usage and surgical skill levels were annotated. The surgeries recorded are laparoscopic gallbladder removals (cholecystectomy). The data for segmentation consists of two parts. In the first part of the training dataset, frames at 2 minute intervals from 24 operations (the same operations as for the workflow challenge) are provided. The second part of the training dataset will consist of brief sequences taken from each video, where frames will be segmented at 1fps. To ensure anonymity, frames corresponding to extra-abdominal views are censored by entirely white (RGB 255 255 255) frames. The testing dataset of 9 videos will not be released.	24 videos	30	HeiSurf Presentation
AutoLaparo	AutoLaparo contains videos of laparoscopic hysterectomy. Three sub-datasets are designed for the following three tasks: surgical workflow recognition, laparoscope motion prediction, instrument and key anatomy segmentation. The videos are recorded at 25 fps with a standard resolution of 1920×1080 pixels. The duration of videos ranges from 27 to 112 minutes due to the varying difficulties of the surgeries. After pre-processing, the average duration is 66 minutes and the total duration is 1388 minutes. Annotations: Surgical workflow recognition: the hysterectomy procedure is divided into 7 phases and each frame is annotated with a phase label. Laparoscope motion prediction: 300 clips are carefully selected from Phase 2-4 of the 21 videos and each clip lasts for 10 seconds. Seven types of motion modes are defined, including one Static mode and six non-static mode: Up, Down, Left, Right, Zoom-in, and Zoom-out. Instrument and key anatomy segmentation: for each clip in the motion prediction task, six frames are sampled at 1fps, and annotated with pixel-wise segmentation. Four types of instruments and one key anatomy is annotated in the dataset: grasping forceps, LigaSure, dissecting and grasping forceps, electric hook, uterus.	Different for each task	21	Wang et al. 2022
SurgToolLoc	This dataset contains clips of surgical training exercises using the da Vinci robotic system. In them, trainees perform standard activities such as dissecting tissue and suturing. There are 24,695 video clips, each 30 seconds long and captured at 60 fps with a resolution of 1280x720 pixels. Training data: for each 30-second clip within the training set, just tool presence labels indicating which robotic tools are installed are provided. For the extent of each clip, the same three tools (out of 14 possible) are installed. However, some may be obscured or temporarily invisible, i.e. there is noise in the tool presence labels of the training set. Testing data: The test has tool presence labels and also bounding boxes around the robotic tools. The videos are sampled at 1Hz.	44M	N/A	N/A
SAR-RARP50	SAR-RARP50 is a multitask dataset that provides action recognition and surgical instrumentation segmentation labels for video segments recorded during 50 Robot-Assisted Radical Prostatectomies (RARP). The operations were performed by 8 surgeons with different surgical seniority (experienced consultant, senior registrar, and junior registrar). The selected segments focus on the suturing of the dorsal vascular complex (DVC), an array of veins and arteries that is sutured to keep bleeding under control after the connection of the prostate to bladder and urethra is cut. Surgical operations were performed using a DaVinci Si robot, recording at 60 frames per second in 1080i resolution stereo video format. After data acquisition, the stereo video channels were time-synchronized and de-interlaced. The 50 videos are grouped into 2 sets with balanced class proportions, one set for training (40 interventions) and one for testing (10 interventions). Class actions: (0, other), (1, picking up the needle), (2, positioning the needle tip), (3, pushing the needle through the tissue), (4, pulling the needle out of the tissue), (5, tying a knot), (6, cutting the suture), (7, returning/dropping the needle). Annotators: the action gesture classes were decided in collaboration with an expert surgeon and annotations were manually generated by an engineer with experience in surgical action recognition. During the data labelling process, the annotator was instructed to assign only one class per frame, choosing from a list of predefined actions. The action recognition labels may include imprecision in the gesture boundary or action ambiguities linked to non-standard surgical gestures and the particularities of each surgeon's technique. The tool segmentations provided are for the left camera view of a stereo endoscope for all 50 RARP pressures at a rate of 1Hz. Semantic information is provided in png format with pixel values corresponding to a different class. The association between pixel values and semantic classes is the following: (1, tool clasper), (2, tool wrist), (3, tool shaft), (4, suturing needle), (5, thread), (6, suction tool), (7, needle holder), (8, clamps), (9, catheter). The tool segmentation annotations were generated by non-medical, professional annotators and were validated independently by the organizers of the challenge. Segmentation annotations may include inaccuracies when: videos are not in focus, camera lenses are not clean, objects are moving fast (resulting in ghosting), there are video compression artifacts, surgical instrumentation is not fully visible, areas are not brightly lit.	10K	50	Psychogyios et al. 2023

Organ segmentation datasets

Dataset	Brief description	Images	Procedures	Paper
Dresden Surgical Anatomy Dataset	The Dresden Surgical Anatomy Dataset provides semantic segmentations of eight abdominal organs (colon, liver, pancreas, small intestine, spleen, stomach, ureter, vesicular glands), the abdominal wall and two vessel structures (inferior mesenteric artery, intestinal veins) in laparoscopic view. The majority of patients (26/32) were male, the overall average age was 63 years and the mean body mass index (BMI) was 26.75 kg/m2 (Table 1). All included patients had a clinical indication for the surgical procedure. Surgeries were performed using a standard Da Vinci® Xi/X Endoscope with Camera (8 mm diameter, 30° angle, Intuitive Surgical, Item code 470057) and recorded using the CAST-System (Orpheus Medical GmbH, Frankfurt a.M., Germany). Each record was saved at a resolution of 1920x1080 pixels in MPEG-4 format and lasts between about two and ten hours.	13K	32	Carstens et al. 2023
SurgAI3.8K	The dataset contains the following annotations: uterus segmentation, uterus contours and the regions of the left and right fallopian tube junctions.	3.8K	79	Zadeh et al. 2023

Bleeding segmentation datasets

Dataset	Brief description	Images	Procedures	Paper
Rabbani et al. 2022	From the 60-hour video footage, 750 frames are extracted for training, and 199 for testing. Authors downsample all the images to 854×480 pixels for training.	949 labelled images and over 60 hours of unlabelled video	96	Rabbani et al. 2022

Open surgery

Repositories holding multiple datasets