Suspicious labels
glicerico opened this issue · 2 comments
Hey @macabdul9 , I realize you used the act_label_1
column in the SwDA data that you share in your repo as labels for training.
That column doesn't seem particularly good as labels, as one can see from pairs obtained from the first rows in the test data:
"Okay." - Other
"I guess" - Info-request:Yes-No-Question
"What kind of experience do you, do you have, then with child care ?" - Other:Segment-(multi-utterance)
These classes don't match with the SwDA classes, I am not sure how they were obtained.
On the other hand, the column act_tag
is not a good option either, as it contains 276 different classes. I think the data needs some cleaning.
Hi @glicerico, can you create a PR with clean data?
Working on it ;)