The Project

This dataset was developed at the department of Computer Science at the University of Copenhagen (DIKU) in connection with the following article:

Evaluation Measures for Relevance and Credibility in Ranked Lists
Christina Lioma, Jakob Grue Simonsen, and Birger Larsen (2017)
ACM SIGIR International Conference on the Theory of Information Retrieval, pg. 91-98.

download from arXiv

The Data

The file data is a comma seperated values file with the following format:

pid, qid, rank, url_id, rel, cred, comments

Below is a brief description of the columns.

  COLUMN   VALUE RANGE   DESCRIPTION
     pid   [1,10]        unique identifier for each participants
     qid   [1,10]        unique identifier for each query
    rank   [1,5]         rank of each query result
  url_id   [101,225]     unique identifier for each url
     rel   [1,4]         relevance score
    cred   [1,4]         credibility score
comments                 some users provided comments for their scores,
                         otherwise the token <NA> is present

The file urls is a file that maps url_id's to their corresponding plaintext representation

The Task

The exact instructions given to the annotators are as follows:

For each of the 10 queries listed below, please do the following:

Submit the query to Google
Click on each of the top 5 results for that query, read it, and assign separately:
- a score of relevance of that result to the query (using the scale specified below)
- a score of credibility of that result (using the scale specified below)

How relevant the clicked webpage is to the query should not affect your assessment of its credibility (relevance and credibility are unrelated). Please use your own understanding of relevance and credibility.

If you do not understand the query, or if you are unsure about the credibility of the webpage, you can open a separate browser and try to gather more information on the topic of the query.

Queries

Smoking not bad for health
Princess Diana alive
Trump scientologist
UFO sightings
Loch Ness monster sightings
Vaccines bad for children
Time travel proof
Brexit illuminati
Climate change not dangerous
Digital tv surveillance

Relevance scale

Not relevant at all
Marginally relevant
Medium relevant
Completely relevant

Credibility scale