FTAD - Fine-grained Turn-taking Action Dataset
Dataset for paper "Human-to-Human Conversation Dataset for Learning Fine-grained Turn-taking Action".
Overview
Data statistic
Dataset summary
FTAD-sw | |
---|---|
# of sessions | 2438 |
Avg. session length | 6.33min |
Word per minute | 201 |
# of Utterances per session | 141.20 |
# of Utterances per minute | 22.27 |
# of DPs$ per session | 195.32 |
# of DPs per minute | 30.85 |
# of DPs(w/o Wait) per minute | 13.43 |
Avg. DP alignment error | 420ms |
Action Distribution
Action | FTAD-sw |
---|---|
Grab_Response | 23.7% |
Grab_Backchannel | 6.1% |
Grab_Silence | 0.03% |
Break_Ignore | 3.77% |
Break_Release | 2.17% |
Keep | 0.56% |
Release | 0.88% |
Grab_Backchannel_Break | 4.71% |
Grab_Response_Break | 1.34% |
Wait&Wait_Silence | 56.66% |
Files
All data files are under data
-
utterrances
contains the strutural annotation of dialogue transcriptionutter.txt
: corpus of dialogue, with each line as an IPU, column separated by tabdecision.txt
: the list of turn-taking decision points and corresponding actions of each dialogue, column separated by tab
-
tasks
contains the turn-taking prediction task data constructed from utterrances, withtrain
dev
test
contain each of the following task data file:eot.txt
: end of turn predictionbreak.txt
: response prediction at opponent's interruptionword_backchannel.txt
: sequential prediction task for backchannelresponse_latency.txt
: expected response time prediction
Data format
utter.txt
Schema for session_id
: session id from Switchboardtype
: from one speaker’s perspective, 'user' stand for the opponent and 'agent' stand for himself.id
: utterance sequential id for each speaker, starting from 0begin_time
: begin time (in milliseconds) of current utteranceend_time
: end time (in milliseconds) of current utterancebegin_decision_id
: corresponding decision point id at utterance begin, -1 means no DP can be associated to this utterance. Only available for agent utterrances.end_decision_id
: corresponding decision point id at utterance end, -1 means no DP can be associated to this utterance. Only available for agent utterrances.text
: the utterrance textext_msg
: some additional annotations generated by pipeline, like shrink tag etc.
decision.txt
Schema for session_id
: session id from Switchboardid
: id of decision pointtime
: timestamp for the decision pointstate
: duplex state of the dialogue at the moment of DP (illustrated at the top)bias
: the error (in milliseconds) of the aligment between DP at closest utterance, negative value means utterance's event comes before DPact
: the action which suject has taken at this DPext_msg
: some additional annotations generated by pipeline
break.txt
Schema for context
:last three utterrances before DP, separated by '|'subject_utterrance
: the suject utterrance being interruptedopponent_utterrance
: the utterrance interrupting the subject by the opponentlabel
: 1 means subject accept the interruption and stop speech, while 0 means opposite.
eot.txt
Schema for context
:last three utterrances before DP, separated by '|'subject_utterrance
: the suject utterrance being interruptedlabel
: 1 means end of turn, 0 means not
response_latency.txt
Schema for context
:last three utterrances before DP, separated by '|'opponent_utterrance
: the utterrance interrupting the subject by the opponentlabel
: 0,1,2 as three time segment
response_latency.txt
Schema for word_seq
:the word sequence of the opponent’s utterrancelabel_seq
: 1 means the position where the suject takes a backchannel