revdotcom/speech-datasets

Off-by-one labeling in 4341191.nlp in Earnings21

ryanwesterman-zoom opened this issue · 0 comments

Starting at line 10876 in 4341191.nlp the labels for every field except token seem to be shifted down by one.

For example, the token uh- here is tagged as 1649 which corresponds to PERSON, and is also punctuated with .... Both of these make more sense on the above token Dean. This continues from line 10876 to the end of the file.

5D9ADD7F-4C61-4934-B22C-BA1B027F5928