[Under Review] Knowledge Extraction and Distillation from Large-Scale Image-Text Colonoscopy Reports Leveraging Large Language and Vision Models

Knowledge Extraction and Distillation from Large-Scale Image-Text Colonoscopy Reports Leveraging Large Language and Vision Models.

We propose to leverage text reports using large language models (LLMs) and colonoscopy images (representations) to provide pixel-level annotation of polyps thereby tackling data annotation challenges in colonoscopy.

  • Feb. 29th, 2024: EndoKED is under review .
Overview of the EndoKED design and applications to polyp diagnosis. (a) The intrinsic supervision from raw colonoscopy reports is extracted leveraging large language and vision models. The report-level lesion label is firstly extracted from the free-text description by a large language model. Then multiple instance learning (MIL) technique propagates the report-level label to the image level. The region-level bounding box is obtained from class activation map (CAM). A large vision model takes the region-level boxes as prompt and generate pixel-level lesion segmentation. (b) The image classification model for optical biopsy is developed in a data-efficient way - pre-training using multi-centre colonoscopy reports and fine-tuning with limited pathology annotation.

To clone all files:

git clone -i https://github.com/zwyang6/ENDOKED.git

To install Python dependencies:

pip install -r requirements.txt


Training Dataset

Evaluation Dataset

EndoKED is evaluated on five public out-of-domain datasets, i.e., CVC-ClinicDB, Kvasir-SEG, ETIS, CVC-ColonDB, and CVC-300. Following the common experimental setups, the training set from CVC-ClinicDB and Kvasir-SEG are not used during the training and we evaluate our model only in the testing set for a fair comparison. The detailed description for the datasets are reported in Table below.

The five public datasets are publicly available at https://pan.baidu.com/s/1A4e7kmvAShaz3BCitpunFA?pwd=s5t5.

Dataset Year Resolution Training Testing Total
CVC-ClinincDB 2015 384x384 550 62 612
Kvasir-SEG 2020 332x487~1920x1072 900 100 1000
ETIS 2014 1225x966 N/A 196 196
CVC-ColonDB 2016 574x500 N/A 380 380
CVC-300 2017 574x500 N/A 60 60

Semantic Results

The results on five public datasets for EndoKED-SEG are reported in the following Table.

Models Kvasir ClinicDB ColonDB CVC-300 ETIS
U-Net 0.818 0.823 0.504 0.710 0.398
U-Net 0.821 0.794 0.482 0.707 0.401
C2FNet 0.886 0.919 0.724 0.874 0.699
DCRNet 0.886 0.896 0.704 0.856 0.556
LDNet 0.887 0.881 0.740 0.869 0.645
Polyp-PVT 0.917 0.948 0.808 0.900 0.787
EndoKED-SEG 0.908 0.920 0.809 0.893 0.818

Training of EndoKED

1. EndoKED-MIL

pyhon ./EndoKED_MIL/train_Endo_BagDistillation_SharedEnc_Similarity_StuFilter.py


  • 2.1 Data processing
    bash ./EndoKED_WSSS/launch/1_data_processing.sh
  • 2.2 Generating Class Activation Maps (CAMs)
    bash ./EndoKED_WSSS/launch/run_ALL.sh
  • 2.3 Refine CAMs to Pseudo Labels
    bash ./EndoKED_WSSS/launch/3_refine_CAM_2_Pseudo.sh

3. EndoKED-SEG

  • 3.1 Train EndoKED-SEG
    bash ./EndoKED_SEG/train.sh
  • 3.2 Refine Preds to Pseudo Labels
    bash ./EndoKED_WSSS/launch/5_refine_Preds_2_Pseudo.sh
  • Iterate Step 3.1-3.2 to optimize EndoKED-SEG

Evaluation of EndoKED

1. EndoKED-MIL

2. EndoKED-SEG

python ./EndoKED_WSSS/eval_tools/a1_eval_pseuo_labels_from_SAM_byPreds_fromDecoder.py

Model logs and checkpoints

We provide the models' logs and checkpoints for EndoKED-SEG, which can be download from https://pan.baidu.com/s/1HaxIZf281lWFpk2USXs6OQ (a9d4) or from google drive with link: https://drive.google.com/drive/folders/1QPGI7T9fa2ogC6_ZB9TChJg2DHIwCvub?usp=drive_link.


We borrowed Polyp-PVT as our segmentation model.Segment Anything and their pre-trained weights are leveraged to refine the pseudo labels. ToCo inspires us to conduct the generation of CAMs. Many thanks to their brilliant works!


