List of surgical tool datasets organised by task. A list of data repositories is also displayed at the bottom. Please open an issue if you see a relevant open dataset which is missing or if you find inacurate information.
Dataset | Brief description | Images | Procedures | Paper |
Cholec80 | 80 videos of cholecystectomy surgeries performed by 13 surgeons. The videos are captured at 25 fps. The dataset is labeled with the phase (at 25 fps) and tool presence annotations (at 1 fps). A tool is defind as present in an image if at least half of the tool tip is visible. | 86K | 80 | Twinanda et al. 2016 |
CATARACTS | This dataset consists of 50 cataract surgery. It was annotated for two main tasks: surgical tool presence detection and surgical activity recognition. It was divided into two sets (train, test) for the surgical tool presence detection task and 3 sets (train, dev, test) for the activity recognition task. | 900K | 50 | Al Hajj et al. 2019 |
Dataset | Brief description | Images | Procedures | Paper |
RMIT | This dataset consists of three image sequences during retinal microsurgery. For each image sequence, the instrument position and size has been hand annotated. | 1.5K | 4 | Sznitman et a. 2012 |
InstrumentCrowd | The training data was generated from a total of 6 surgical procedures, three from laparoscopic adrenalectomies and three from laparoscopic pancreatic resections. From each surgery, 20 images containing one or several medical instruments were extracted, yielding 120 images in total. | 120 | 6 | Maier-Hein et al. 2014 |
NeuroSurgicalTools | Consists of 2476 monocular images (1221 for training and 1255 for testing) coming from in vivo neurosurgeries. The resolution of the images varies from 612×460 to 1920×1080. | 2.5K | 14 | Bouget et al. 2015 |
EndoVis2015 | 40 2D in-vivo images from 4 laparoscopic colorectal surgeries. Each pixel is labelled as either background, shaft and manipulator (~160 2D images and annotations in total). 4x 45-second 2D images sequences of at least one Large Needle Driver instrument in an ex-vivo setup. Each pixel is labelled as either backgroud, shaft, head or clasper. | 9K | 8 | N/A |
EndoVis2017 | 8x 225-frame robotic surgical videos, captured at 2 Hz, with manually labelled different tool parts and types. The testing set contains 8x 75-frame videos and 2x 300-frame videos. | 1.8K | 8 | Allan et al. 2019 |
EndoVis2018 | Training dataset is made up of 16 robotic nephrectomy procedures recorded using da Vinci Xi systems in porcine labs (subsampled to 2fps). Sequences with little or no motion are manually removed to leave 149 frames per procedure. Video frames are 1280x1024 and we provide the left and right eye camera image as well as the stereo camera calibration parameters. Labels are only provided for the left image. | 2.4K | 16 | Allan et al. 2020 |
ROBUST-MIS2019 | Procedures in rectal resection and proctocolectomy. A training case encompasses a 10 second video snippet in form of 250 endoscopic image frames and a reference annotation for the last frame. In the annotated frame a “0” indicates the absence of a medical instrument and numbers “1”, “2“, ... represent different instances of medical instruments. | 10K | 30 | Ross et al. 2020 |
Kvasir-Instrument |
The Kvasir-Instrument dataset consists of consists of 590 annotated frames comprising of GI procedure tools such as snares, balloons, biopsy forceps, etc. The resolution of the image in the dataset varies from 720x576 to 1280x1024. | 590 | N/A | Jha et al. 2020 |
CholecSeg8k | This dataset contains 8080 laparoscopic cholecystectomy image frames extracted and annotated from 17 video clips in Cholec80. | 8K | 17 | Hong et al. 2020 |
RoboTool | 514 images extracted from the videos of 20 freely available robotic surgical procedures and annotated for binary tool-background segmentation. | 514 | 20 | Garcia-Peraza-Herrera et al. 2021 |
Dataset | Brief description | Images | Procedures | Paper |
CholecT50 | Every frame is annotated with labels from the triplet: instrument, verb and target. | N/A | 50 | Nwoye et al. 2022 |
SARAS-MESAD2021 | Dataset contains monocular digital recordings from da Vinci Xi robotic system. Two sub-datasets: MESAD-Real and MESAD-Phantom. MESAD-Real represents the prostatectomy procedures recorded on human patients. It contains four sessions of complete prostatectomy procedure performed by expert surgeons on real patients. MESAD-Phantom is also designed for surgeon action detection during prostatectomy, but is composed of videos captured during procedures on phantoms used for the training of surgeons. MESAD-Real comprises 21 action classes and MESAD-Phantom contemplates a smaller list of 14 action classes. Both the datasets share 11 action classes. | N/A | 4 | N/A |
PSI-AVA | PSI-AVA is a dataset designed for holistic surgical scene understanding. It contains approximately 20.45 hours of the surgical procedure performed by three expert surgeons and annotations for both long-term (Phase and Step recognition) and short-term reasoning (Instrument detection and novel Atomic Action recognition) in robot-assisted radical prostatectomy videos. | N/A | 8 | Valderrama et al. 2022 |
Dataset | Brief description | Images | Procedures | Paper |
JIGSAWS | The JIGSAWS dataset consists of three components: kinematic data (Cartesian positions, orientations, velocities, angular velocities and gripper angle describing the motion of the manipulators), video data (stereo video captured from the endoscopic camera), and manual annotations of gestures (atomic surgical activity segment labels) and skill (global rating score using modified objective structured assessments of technical skills). | N/A | N/A | Gao et al. 2014 |
Cataract-101 | This dataset contains 101 videos of cataract surgeries annotated with two kinds of information: Anonymous ID and experience level of operating surgeon, and starting points of quasi-standardized operation phases in videos. | 1.3M | 101 | Schoeffmann et al. 2018 |
HeiCo | The data set contains of data from the ROBUST-MIS 2019 challenge and the Surgical Workflow Challenges from EndoVis 2017 and 2018. | 10K | 30 | Maier-Hein et al. 2020 |
MISAW | The data-set contains 27 micro-anastomosis training sequences and is composed of the following information: stereoscopic video, kinematic data, workflow annotation at 3 levels of granularity (phases, steps, and activities). | N/A | 27 | Huaulmé et al. 2021 |
PETRAW | Dataset for online automatic recognition of surgical workflow by using both kinematic and stereoscopic video information on a micro-anastomosis training task. | N/A | 100 | N/A |
Dataset | Brief description | Paper |
Laparoscopic Image to Image Translation | Synthetic images in a 3D environemnt roughly resembling laparoscopic liver surgery scenes. A group of Generative Adversarial Networks (GAN) is trained to translate these images to look like real laparoscopic images. After the training process, the translated images along with their labels can be used as training data for a certain target task. | Pfeiffer et al. 2019 |
Dataset | Brief description | Images/Videos | Procedures | Paper |
ART-Net | This dataset consists non-robotic tools with annotated tool presence, tool segmentation, and instrumnt geometric primitives (mid-line, edge-line, tooltip). The images come from laparoscopic hysterectomy videos. This dataset also contains tool presence annotated for another set of 3000 images, namely 1500 positive and 1500 negative images, respectively, for which some positive images contain multiple tools. 4270 images are labelled for tool detection. If the tool shaft is not visible at all, the image is marked as negative. When a small part of the tool shaft is visible, the image is marked as positive. For segmentation and geometric primitive extraction, 635 images are annotated. | Different for each task | 29 | Hasan al. 2021 |
HeiSurF | Surgical Workflow Analysis and Full Scene Segmentation. All surgeries were annotated framewise for surgical phases by surgical experts. Surgical actions, instrument usage and surgical skill levels were annotated. The surgeries recorded are laparoscopic gallbladder removals (cholecystectomy). The data for segmentation consists of two parts. In the first part of the training dataset, frames at 2 minute intervals from 24 operations (the same operations as for the workflow challenge) are provided. The second part of the training dataset will consist of brief sequences taken from each video, where frames will be segmented at 1fps. To ensure anonymity, frames corresponding to extra-abdominal views are censored by entirely white (RGB 255 255 255) frames. The testing dataset of 9 videos will not be released. | 24 videos | 30 | HeiSurf Presentation |
AutoLaparo |
AutoLaparo contains videos of laparoscopic hysterectomy.
Three sub-datasets are designed for the following three tasks:
surgical workflow recognition, laparoscope motion prediction, instrument and key anatomy segmentation.
The videos are recorded at 25 fps with a standard resolution of 1920×1080 pixels.
The duration of videos ranges from 27 to 112 minutes due to the varying difficulties of the surgeries. After pre-processing, the average duration is 66 minutes and the total duration is 1388 minutes.
Annotations:
|
Different for each task | 21 | Wang et al. 2022 |
SurgToolLoc |
This dataset contains clips of surgical training exercises using the da Vinci robotic system.
In them, trainees perform standard activities such as dissecting tissue and suturing.
There are 24,695 video clips, each 30 seconds long and captured at 60 fps with a resolution of 1280x720 pixels.
|
44M | N/A | N/A |
SAR-RARP50 | SAR-RARP50 is a multitask dataset that provides action recognition and surgical instrumentation segmentation labels for video segments recorded during 50 Robot-Assisted Radical Prostatectomies (RARP). The operations were performed by 8 surgeons with different surgical seniority (experienced consultant, senior registrar, and junior registrar). The selected segments focus on the suturing of the dorsal vascular complex (DVC), an array of veins and arteries that is sutured to keep bleeding under control after the connection of the prostate to bladder and urethra is cut. Surgical operations were performed using a DaVinci Si robot, recording at 60 frames per second in 1080i resolution stereo video format. After data acquisition, the stereo video channels were time-synchronized and de-interlaced. The 50 videos are grouped into 2 sets with balanced class proportions, one set for training (40 interventions) and one for testing (10 interventions). Class actions: (0, other), (1, picking up the needle), (2, positioning the needle tip), (3, pushing the needle through the tissue), (4, pulling the needle out of the tissue), (5, tying a knot), (6, cutting the suture), (7, returning/dropping the needle). Annotators: the action gesture classes were decided in collaboration with an expert surgeon and annotations were manually generated by an engineer with experience in surgical action recognition. During the data labelling process, the annotator was instructed to assign only one class per frame, choosing from a list of predefined actions. The action recognition labels may include imprecision in the gesture boundary or action ambiguities linked to non-standard surgical gestures and the particularities of each surgeon's technique. The tool segmentations provided are for the left camera view of a stereo endoscope for all 50 RARP pressures at a rate of 1Hz. Semantic information is provided in png format with pixel values corresponding to a different class. The association between pixel values and semantic classes is the following: (1, tool clasper), (2, tool wrist), (3, tool shaft), (4, suturing needle), (5, thread), (6, suction tool), (7, needle holder), (8, clamps), (9, catheter). The tool segmentation annotations were generated by non-medical, professional annotators and were validated independently by the organizers of the challenge. Segmentation annotations may include inaccuracies when: videos are not in focus, camera lenses are not clean, objects are moving fast (resulting in ghosting), there are video compression artifacts, surgical instrumentation is not fully visible, areas are not brightly lit. | 10K | 50 | Psychogyios et al. 2023 |
Dataset | Brief description | Images | Procedures | Paper |
Dresden Surgical Anatomy Dataset | The Dresden Surgical Anatomy Dataset provides semantic segmentations of eight abdominal organs (colon, liver, pancreas, small intestine, spleen, stomach, ureter, vesicular glands), the abdominal wall and two vessel structures (inferior mesenteric artery, intestinal veins) in laparoscopic view. The majority of patients (26/32) were male, the overall average age was 63 years and the mean body mass index (BMI) was 26.75 kg/m2 (Table 1). All included patients had a clinical indication for the surgical procedure. Surgeries were performed using a standard Da Vinci® Xi/X Endoscope with Camera (8 mm diameter, 30° angle, Intuitive Surgical, Item code 470057) and recorded using the CAST-System (Orpheus Medical GmbH, Frankfurt a.M., Germany). Each record was saved at a resolution of 1920x1080 pixels in MPEG-4 format and lasts between about two and ten hours. | 13K | 32 | Carstens et al. 2023 |
SurgAI3.8K | The dataset contains the following annotations: uterus segmentation, uterus contours and the regions of the left and right fallopian tube junctions. | 3.8K | 79 | Zadeh et al. 2023 |
Dataset | Brief description | Images | Procedures | Paper |
Rabbani et al. 2022 | From the 60-hour video footage, 750 frames are extracted for training, and 199 for testing. Authors downsample all the images to 854×480 pixels for training. | 949 labelled images and over 60 hours of unlabelled video | 96 | Rabbani et al. 2022 |