This dataset was developed at the department of Computer Science at the University of Copenhagen (DIKU) in connection with the following article:
Evaluation Measures for Relevance and Credibility in Ranked Lists
Christina Lioma, Jakob Grue Simonsen, and Birger Larsen (2017)
ACM SIGIR International Conference on the Theory of Information Retrieval, pg. 91-98.
The file data
is a comma seperated values file with the following format:
pid, qid, rank, url_id, rel, cred, comments
Below is a brief description of the columns.
COLUMN VALUE RANGE DESCRIPTION
pid [1,10] unique identifier for each participants
qid [1,10] unique identifier for each query
rank [1,5] rank of each query result
url_id [101,225] unique identifier for each url
rel [1,4] relevance score
cred [1,4] credibility score
comments some users provided comments for their scores,
otherwise the token <NA> is present
The file urls
is a file that maps url_id
's to their corresponding plaintext representation
The exact instructions given to the annotators are as follows:
For each of the 10 queries listed below, please do the following:
- Submit the query to Google
- Click on each of the top 5 results for that query, read it, and assign separately:
- a score of relevance of that result to the query (using the scale specified below)
- a score of credibility of that result (using the scale specified below)
How relevant the clicked webpage is to the query should not affect your assessment of its credibility (relevance and credibility are unrelated). Please use your own understanding of relevance and credibility.
If you do not understand the query, or if you are unsure about the credibility of the webpage, you can open a separate browser and try to gather more information on the topic of the query.
- Smoking not bad for health
- Princess Diana alive
- Trump scientologist
- UFO sightings
- Loch Ness monster sightings
- Vaccines bad for children
- Time travel proof
- Brexit illuminati
- Climate change not dangerous
- Digital tv surveillance
- Not relevant at all
- Marginally relevant
- Medium relevant
- Completely relevant
- Not credible at all
- Marginally credible
- Medium credible
- Completely credible