
  • guidelines (in Dutch)
  • DONE human annotated data ( NAF)
  • DONE machine annotated data (NAF)
  • lexicon based on annotated data (LMF)
  • overview annotation tags, and emotion hierarchies
    • korte beschrijving per dataset
  • data license
  • Translation between Dutch and English labels
  • Corpus metadata?
  • anything else

The dataset consists of four subsets:

  1. Annotation corpus: texts selected from Nederlab manually annotaded with HEEM labels (including humor modifiers and intensifiers) (29 texts)
  2. Ceneton: texts selected from Ceneton (34 texts)
  3. Corpus big: other texts selected from Nederlab (149 texts)
  4. EDBO: texts selected from Early Dutch Books Online (67 texts)


The naf directory contains the annotations and predicted labels in NAF- format. The emotions can be found in the emotions-layer:

  <emotion id="emo0">
      <!-- kop -->
      <target id="t1874"/>
      <externalRef reference="conceptType:bodyPart" resource="heem"/>
      <externalRef reference="head" resource="heem:bodyParts"/>
  <emotion id="emo1">
      <!-- maek my de kop niet warm -->
      <target id="t1871"/>
      <target id="t1872"/>
      <target id="t1873"/>
      <target id="t1874"/>
      <target id="t1875"/>
      <target id="t1876"/>
      <externalRef reference="conceptType:bodilyProcess" resource="heem"/>
      <externalRef reference="emotionType:anger" resource="heem"/>
      <externalRef reference="anger" resource="heem:clusters"/>
      <externalRef reference="negative" resource="heem:posNeg"/>