TREC Conversational Assistance Track (CAsT)

There are currently few datasets appropriate for training and evaluating models for Conversational Information Seeking (CIS). The main aim of TREC CAsT is to advance research on conversational search systems. The goal of the track is to create a reusable benchmark for open-domain information centric conversational dialogues.

The track will run in 2022 and establish a concrete and standard collection of data with information needs to make systems directly comparable.

This is the fourth year of TREC CAsT, which will run as a track in TREC. This year we aim to focus on candidate information ranking in context:

Read the dialogue context: Track the evolution of the information need in the conversation, identifying salient information needed for the current turn in the conversation
Retrieve Candidate Response Information: Perform retrieval over a large collection of paragraphs (or knowledge base content) to identify relevant information

This year, there's an optional mixed-Initiative sub-task that evaluates the ability of systems to use mixed-initiative for more effective conversations. For more information, see the guidelines linked below.

Year 4 (TREC 2022)

Important Dates

Guidelines release: Available
Evaluation topics release: Available
MI task submission deadline: August 22nd, 2022
Main task submission deadline: September 1st, 2022

Data

Topics

Evaluation topics for Year 4 - Primary evaluation topics in JSON format.

Mixed Initiative Question Pool

Question Pool - Over 4000 candidate questions to be used for the mixed-initiative subtask

Corpora

Washington Post 2020 - Same from Year 3.
KILT Wikipedia - Same from Year 3.
MS MARCO V2 (Documents) - MS MARCO V2 document corpus used in 2021 TREC Deep Learning Track.

Participants have the option of processing the collection (to generate passage splits) themselves using the provided tools or requesting the processed corpus from the organizers. You can make the request to the organization team via Slack or Google Groups (see below).

If you processed the corpora yourself, please verfify that you have the right passage splits by comparing the hashes of each passage with the master version.

Baselines

Question Ranking using Bm25

Duplicate handling

Duplicate Files

Guidelines

Year 4 task guidelines
Note: Participants MUST REGISTER to submit.

Run Validations

Scripts are available in the tools repository to make sure that your main task and mixed initiative runs are in correct shape.

Contact

Twitter: @treccast
Slack: treccast.slack.com
Google groups trec-cast@googlegroups.com

Organizers

Jeff Dalton, University of Glasgow
Mohammad Aliannejadi, University of Amsterdam
Leif Azzopardi, University of Strathclyde
Paul Owoicho University of Glasgow
Johanne Trippas, The University of Melbourne
Svitlana Vakulenko, University of Amsterdam / Amazon

Year 3 (TREC 2021)

Important Dates

Test topic release: Available
Run submission deadline: August 18th, 2021

Data

Topics

NEW Evaluation topics for Year 3 V1.0 - 25 primary evaluation topics in JSON and Protocol Buffer format. There are two variants automatic and manual.

Corpora

Washington Post 2020 - Note: This is a new dump including January 2012 through December 2020.
KILT Wikipedia - From the 2019/08/01 Wikipedia dump.
MS MARCO (Documents) - MS MARCO document corpus used in DL 2019/2020. NOTE: This is NOT the recently release V2 corpus.

Baselines

Automatic and manual baselines are provided baseline runs. See README for details.
Interactive web UI - A web UI supporting BM25 document retrieval and T5 passage reranking. Rewriting is supported using query and result passage context using the T5-Canard model, that format is the "raw rewrite". A manual that explains how to use the UI can be found here.

Duplicate handling

The provided tools support removing duplicate documents from collection, including both MARCO and WaPo duplicates.
Duplicate Files

Guidelines

Year 3 task guidelines
Note: Participants MUST REGISTER to submit.

Contact

Twitter: @treccast
Slack: treccast.slack.com
Google groups trec-cast@googlegroups.com

Organizers

Jeff Dalton, University of Glasgow
Chenyan Xiong, Microsoft Research
Jamie Callan, Carnegie Mellon University

Year 2 (TREC 2020)

News

May 2020: Year 2 guidelines released
July 2020: Year 2 evaluation topics released

Data

Topics

Evaluation topics for Year 2 V1.0 - 25 primary evaluation topics in JSON and Protocol Buffer format. There are two variants automatic and manual.

Baselines

NEW - BM25 + BERT baseline - We provide a BM25 + BERT reranked baseline run for the raw utterances, automatically rewritten utterances, and the manually rewritten utterances.
NEW - Interactive web UI - A simple web UI with the BM25 + BERT model used to create the baseline runs. No rewriting is performed.

Collection

The corpus is a combination of two standard TREC collections: MARCO Ranking passages and Wikipedia (TREC CAR).
The MS MARCO Passage Ranking collection - This file only includes the passage id and passage text. For convenience, we also provide a passage id -> URL mapping file in TSV format pid to URL file.
The TREC CAR paragraph collection v2.0

Year 1 (TREC 2019)

Read the TREC 2019 Overview paper.

2019 Data

Topics

Training topics year 1 V1.0 - 30 example training topics
Training judgments - We provide limited (incomplete) training data for 5 topics (approximately 50 turns). These are judged from the baseline retrieval run (below). The judgments are graded on a three point scale (2 very relevant, 1 relevant, and 0 not relevant).
Evaluation topics year 1 V1.0 - 50 evaluation topics
Additional resources: MS MARCO Conversational Search Sessions Conversational Search data and train data is released.

Resolved Topic Annotations

To facilitate work on passage ranking only we performed manual resolution of coreference as well as conversational ambiguity for topics. We make these available to participants who may not have access to automatic methods. Runs using this data manual runs. The annotations are provided in a tab separated format with the turn id (query id) and the rewritten query in text form.
TRAIN: Sample annotations on two training queries (for exemplars)
EVALUTION: Complete annotations on the evaluation topics for the year 1 evaluation queries.

Baselines

Indri search interface - We provide an Indri index of the CAsT collection. See the help page for details on indexing parameters and statistics. It includes a standard batch search API limited to 50 queries per batch.)
Baseline retrieval - We provide the queries and run files in trec eval format: train queries, train run file, test queries, test run file - We provide an Indri baseline run with Query Likelihood run, including both the topics and run files. Queries are generated by running AllenNLP coreference resolution to perform rewriting and stopwords are removed using the Indri stopword list.

Collection

The corpus is a combination of three standard TREC collections: MARCO Ranking passages, Wikipedia (TREC CAR), and News (Washington Post)
The MS MARCO Passage Ranking collection - This file only includes the passage id and passage text. For convenience, we also provide a passage id -> URL mapping file in TSV format pid to URL file.
The TREC CAR paragraph collection v2.0
The TREC Washington Post Corpus version 2: Note this is behind a password and requires an organizational agreement, to obtain it see: https://ir.nist.gov/wapo/

Document ID format

The document id format is [collection_id_paragraph_id] with collection id and paragraph id separated by an underscore.
The collection ids are in the set: {MARCO, CAR, WAPO}.
The paragraph ids are: standard provided by MARCO and CAR. For WAPO the paragraph ID is [article_id-paragraph_index] where the paragraph_index is the starting from 1-based index of the paragraph using the provided paragraph markup separated by a single dash.
Example WaPo combined document id: [WAPO_903cc1eab726b829294d1abdd755d5ab-1], or CAR: [CAR_6869dee46ab12f0f7060874f7fc7b1c57d53144a]

Duplicate handling

Early analysis found that both the MARCO and WaPo corpora both contain a significant number of near duplicate paragraphs. We have run near-dupliate detection to cluster results; only one result per duplicate cluster will be evaluated. It is suggested that you remove dupliates (keeping the canonical document) from your indices.
A README with the process and file format.
Washington Post duplicate file
MARCO duplicate file
Note: The tools in the repository below require these files as input for processing the collection and perform deduplication when the data is generated.

Code and tools

TREC-CAsT Tools repository with code and scripts for processing data.
The tools contain scripts for parsing the collection into standard indexing formats. It also provides APIs for working with the topics (in text, json, and protocol buffer formats).
Note: This will evolve over time, it currently contains topic definition files and scripts for reading and loading topics.

ginic/treccastweb

TREC Conversational Assistance Track (CAsT)

Year 4 (TREC 2022)

Important Dates

Data

Topics

Mixed Initiative Question Pool

Corpora

Baselines

Duplicate handling

Guidelines

Run Validations

Contact

Organizers

Year 3 (TREC 2021)

Important Dates

Data

Topics

Corpora

Baselines

Duplicate handling

Guidelines

Contact

Organizers

Year 2 (TREC 2020)

News

Data

Topics

Baselines

Collection

Year 1 (TREC 2019)

2019 Data

Topics

Resolved Topic Annotations

Baselines

Collection

Document ID format

Duplicate handling

Code and tools