    中文医疗信息处理挑战榜CBLUE(Chinese Biomedical Language Understanding Evaluation)是**中文信息学会医疗健康与生物信息处理专业委员会在合法开放共享的理念下发起,由阿里云天池平台承办,并由医渡云(北京)技术有限公司、平安医疗科技、北京大学、郑州大学、鹏城实验室、哈尔滨工业大学(深圳)、同济大学、夸克、阿里巴巴达摩院等开展智慧医疗研究的单位共同协办,旨在推动中文医学NLP技术和社区的发展。 榜单在设计上综合考虑了任务类型和任务难度两个维度,目标是建设一个任务类型覆盖广、同时也要保证任务的难度的benchmark,因此榜单在吸收往届CHIP学术评测的同时也适当增加了业界数据集,业务数据集的特点是数据真实且有噪音,对模型的鲁棒性提出了更高的要求。一期榜单任务包括医学文本信息抽取(实体识别、关系抽取)、医学术语归一化、医学文本分类、医学句子关系判定和医学QA共5大类任务8个子任务。

    paper github website


    BLURB is the Biomedical Language Understanding and Reasoning Benchmark. A collection of resources for biomedical natural language processing.

    paper website

  • 2004
  • 2006
  • 2009

    • n2c2 2009: Medication Extraction Challenge

      Medication extraction challenge aims to encourage development of natural language processing systems for the extraction of medication-related information from narrative patient records. Information to be targeted includes medications, dosages, modes of administration, frequency of administration, and the reason for administration.


  • 2012
  • 2015
    • BioCreative V Track 2-CHEMDNER-patents

      automatic extraction of chemical and biological data from medicinal chemistry patents.

      The CHEMDNER-patents corpora will consist of a training, development and test set, each comprising a total of 7000 manually annotated records.

      CEMP (chemical entity mention in patents, main task)

      CPD (chemical passage detection, text classification task)

      GPRO (gene and protein related object task)


  • 2017
  • 2018
  • 2019
    • BioNLP-OST 2019 CRAFT-CA task: Concept Annotation Task

      Chemical Entities of Biological Interest (CHEBI), Cell Ontology (CL), Gene Ontology Biological Process (GO_BP), Gene Ontology Cellular Component (GO_CC), Gene Ontology Molecular Function (GO_MF), Molecular Process Ontology (MOP), NCBI Taxonomy (NCBITaxon), Protein Ontology (PR), Sequence Ontology (SO), Uberon (UBERON).

    • BioNLP-OST 2019 PharmaCoNER task

      Entity types: Normalizables, No_Normalizables, Proteinas, Unclear

    • BioNLP-OST 2019 AGAC task

      Task 1 is a traditional NER for 12 labels, which cultivate molecular phenomena related to gene mutation. Variation (Var), Molecular Physiological Activity (MPA), Interaction, Pathway, Cell Physiological Activity (CPA), Regulation (Reg), Positive Regulation (PosReg), Negative Regulation (NegReg); Disease, Gene, Protein, Enzyme.

      Task 2 is a relation extraction task, which capture the thematic roles between entities. ThemeOf, CauseOf.

      Task 3 is a prediction task for the novel link discovery, which extract triple information among gene, function change, and disease out of the corpus texts. Gene;Function change;disease.

    • BioNLP-OST 2019 Bacteria-Biotope Task

      the BB task is an information extraction task involving entity recognition, entity normalization, and relation extraction.

      4 entity types: Microorganism, Habitat, Geographical, Phenotype.

      2 relation types: Lives_in, Exhibits.

    • CCKS 2019 面向中文电子病历的命名实体识别





  • 2004
  • 2006
  • 2010
    • BioCreative III: PPI: Protein-Protein Interactions

      The aim of this task is to promote the development of automated systems that are able to extract biologically relevant information directly from the literature, in this case related to protein-protein interaction (PPI) annotation information.

  • 2010
    • n2c2 2010: Relations Challenge

      1. extraction of medical problems, tests, and treatments. 2) classification of assertions made on medical problems, present, absent, or possible. 3) relations of medical problems, tests, and treatments.

      A total of 394 training reports, 477 test reports, and 877 unannotated reports were de-identified and released to challenge participants with data use agreements.


  • 2011
    • BioNLP Shared Task 2011: Entity Relations Supporting Task (REL)

      The task concerns the detection of relations stated to hold between a gene or gene product and a related entity such as a protein domain or protein complex.

      Entities: human-annotated gene and gene product entities, annotated as "Protein"

      Relation Type: Subunit-Complex, Protein-Component

  • 2012
    • n2c2 2012: Temporal Relations Challenge

      The 2012 i2b2 temporal relations challenge data include 310 discharge summaries consisting of 178 000 tokens. Clinically relevant events include clinical concepts, clinical departments, evidentials, occurrences. Temporal relations: BEFORE, AFTER, SIMULTANEOUS, OVERLAP, BEGUN_BY, ENDED_BY, DURING, BEFORE_OVERLAP.


  • 2013
  • 2017
  • 2018
    • n2c2 2018 — Track 2: Adverse Drug Events and Medication Extraction in EHRs

      This task aims to answer the question: “Can NLP systems automatically discover drug to adverse event (ADE) relations in clinical narratives?”. three subtasks: 1) Concepts: Identifying drug names, dosages, durations and other entities. 2) Relations: Identifying relations of drugs with adverse drugs events (ADEs)[1] and other entities given gold standard entities. 3) End-to-end: Identifying relations of drugs with ADEs and other entities on system predicted entities.



  • 2004
  • 2011
    • BioNLP Shared Task 2011: GENIA Event Extraction (GENIA)

      The GENIA task aims at extracting events occurring upon genes or gene products, which are typed as "Protein" without differentiating genes from gene products. Other types of physical entities, e.g. cells, cell components, are not differentiated from each other, and their type is given as "Entity"

    • BioNLP Shared Task 2011: Epigenetics and Post-translational Modifications Task (EPI)

      This task focuses on events relating to epigenetic change, including DNA methylation and histone modification, as well as other common post-translational protein modifications.

      Event type: Hydroxylation(羟基化), Phosphorylation(磷酸化), Ubiquitination(泛素化), DNA methylation(DNA甲基化), Glycosylation(糖基化), Acetylation(乙酰化), Methylation(甲基化), Catalysis(催化).

    • BioNLP Shared Task 2011: Infectious Diseases Task (ID)

      This tasks focuses on the biomolecular mechanisms of infectious diseases.

      Five entities: Genes and gene products, Two-component systems, Chemicals, Organisms, Regulons/Operons.

      Nine events: Gene expression, Transcription, Protein catabolism, Phosphorylation, Localization, Binding, Regulation, Positive regulation, Negative regulation, Process.

    • BioNLP Shared Task 2011: Bacteria Biotopes (BB)

      The task consists in extracting bacteria localization events, in other words, mentions of given species and the place where it lives.

      Entities: Host, HostPart, Geographical, Environment, Food, Medical, Soil, Water.

      Events: Localization, PartOf.

    • BioNLP Shared Task 2011: Bacteria Gene Interactions (BI)

      This task consists in a full extraction of genetic processes mentioned in scientific texts concerning the bacterium Bacillus subtilis.

      Entities: GeneProduct, Protein, PolymeraseComplex, Gene, ProteinFamily, GeneFamily, GeneComplex, Regulon, Site, Promoter, Action, Transcription, Expression.

      Events: RegulonDependence, BindTo, TranscriptionFrom, RegulonMember, SiteOf, TranscriptionBy, PromoterOf, PromoterDependence, ActionTarget, Interaction.

    • BioNLP Shared Task 2011: Bacteria Gene Renaming (RENAME)

      The task consists in extracting gene renaming acts and gene synonymy reminders in scientific texts about bacteria.

      Entities: All gene and protein names have been annotated as text-bound entities of type Gene.

      Events: The only type of event is Renaming where both arguments are of type Gene.

  • 2013
    • BioNLP-ST 2013: Cancer Genetics (CG) Task

      The CG task aims to advance the automatic extraction of information from statements on the biological processes relating to the development and progression of cancer.

    • BioNLP-ST-2013: Pathway Curation (PC) task

      The PC task aims to evaluate the applicability of event extraction systems to support the curation, evaluation and maintenance of biomolecular pathway models and to encourage the further development of methods for these tasks.

    • BioNLP-ST-2013: Bacteria Biotopes (BB)

      Entity recognition of bacteria taxa and bacteria habitats. Bacteria habitat categorization through the OntoBiotope-Habitat ontology. Extraction of localization relations between bacteria and habitats.

  • 2016
  • 2019
    • BioNLP-OST 2019 Seedev Task

      the SeeDev representation scheme defines 16 entity types. task1: Binary relation extraction task. task2: Full event extraction task, these entities participates in 21 types of events that can be grouped into five categories.




  • 2006
    • n2c2 2006: Deidentification and Smoking Challenge

      Study the two challenge questions on the same data. Task 2: identification of the smoking status of patients. Classify patient records into five possible smoking status categories: Past Smoker, Current Smoker, Smoker, Non-Smoker, Unknown.


  • 2008
    • n2c2 2008: Obesity Challenge

      The obesity challenge is a multi-class, multi-label classification task focused on obesity and its co-morbidities. The data for the challenge consist of discharge summaries from Partners Healthcare. All records have been fully de-identified. Obesity information and co-morbidities have been marked at a document level as present, absent, questionable, or unmentioned in the documents.


  • 2019
    • CHIP 2019 评测三:临床试验筛选标准短文本分类







  • 2018
  • 2019
    • CHIP 2019 评测二:平安医疗科技疾病问答迁移学习比赛

      本次评测任务的主要目标是针对中文的疾病问答数据,进行病种间的迁移学习。具体而言,给定来自5个不同病种的问句对,要求判定两个句子语义是否相同或者相近。所有语料来自互联网上患者真实的问题,并经过了筛选和人工的意图匹配标注。病种包括:diabeteshypertensionhepatitisaidsbreast cancer




  • 2010
  • 2018
    • n2c2 2018 — Track 1: Cohort Selection for Clinical Trials

      This task aims to answer the question, “Can NLP systems use narrative medical records to identify which patients meet selection criteria for clinical trials?” The task requires NLP systems to compare each patient to a list of selection criteria, and determine if the patients meet, do not meet, or possibly meet each criterion.


  • 2019
    • BioNLP-OST 2019 RDoc Task

      task1 (RDoC-IR) is on retrieving PubMed Abstracts related to RDoC constructs. 250 abstracts for train and 200 abstracts for test. task 2 (RDoC-SE) is on extracting the most relevant sentences for an RDoC construct from a relevant abstract. 250 abstracts for train and 50 abstracts for test.



  • 2020
    • CCKS 2020 新冠知识图谱构建与问答

      四个子任务:1)新冠百科知识图谱类型推断, 2)新冠概念图谱的上下位关系预测,3)新冠科研抗病毒药物图谱的链接预测,4)新冠百科知识图谱问答评测。


  • BioBERT: a pre-trained biomedical language representation model for biomedical text mining

    paper github

  • BERTCNER: Chinese clinical named entity recognition (CNER) using pre-trained BERT model

    paper github

  • BlueBERT: pre-trained on PubMed abstracts and clinical notes (MIMIC-III)

    paper github

  • ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission

    paper github

  • LinkBERT: Pretraining Language Models with Document Links

    paper github

  • SciBERT: A Pretrained Language Model for Scientific Text

    paper github

  • PubMedBERT: Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

    paper website


