xiaoman-zhang/KAD

Extra dataset?

Closed this issue · 2 comments

Impressive work!

In your work, KAD reaches substantial improvement on all tasks. In the paper, the authors mentioned that: in the entity extraction stage, RadGraph is utilized. From the officical paper of RadGraph, they said this model is trained on 500 clinician-annotated radiology reports.
Does it mean the improvement is from the annotated-data pretrained knowledge graph extractor?
If it is not, can we use any entity extractor or genral knowledge graph to replace RadGraph during pretraining?

That's a very good question, which was also asked by the reviewer. To analyze the effect of incorporating RadGraph (vs not), we conduct an ablation experiment on KAD without the use of RadGraph, and the main difference is the entity extraction section(https://arxiv.org/pdf/2302.14042.pdf, Sec 3.4).
We can simply utilize the Unified Medical Language System (UMLS) for entity extraction. or each sentence in the radiology report, we were able to extract a sequence of entities (entity, concept, CUI, TUI) with spacy, where CUI represents “Concept Unique Identifier” and TUI represents “Type Unique Identifier”. Here, we provide the pseudo-code as follows:
image
We filtered the entity list for each sentence based on TUI, retaining only the entities with TUI in (T033: Finding, and T047: Disease or Syndrome). Next, we match these entities with our entity set Q, except for `normal'. To determine the presence of an entity, we adopted a straightforward rule: if the sentence contains the words 'no', 'none', or 'normal', the label is set to 0 (absent); otherwise, the label is set to 1 (present). For entities that are not mentioned in the report, we assigned a label of 0.
If all entities have a label of 0, we set the label of 'normal' to 1. To this end, we replaced all the parts where RadGraph was used in our paper with the use of UMLS, and the result is shown in Table 1.

Also we can just use ChatGPT as presented in the paper.

Thanks for the detailed explaination. For the checkpoint on google drive, would you mind release the model without access request? I have checked the link but it said I hanv no permission to download the model