MontrealCorpusTools/ISCAN

Glaswasian Corpus fails to import in ISCAN

Opened this issue · 2 comments

Glaswasian corpus fails to import. Error message (from MG):

[2019-07-11 10:32:00,928: INFO/MainProcess] Received task: iscan.tasks.import_corpus_task[e5a5676d-10b8-4753-a61f-006284f84e42]
[2019-07-11 10:32:02,515: WARNING/ForkPoolWorker-17] loading /projects/spade/repo/git/spade-Glaswasian with <polyglotdb.io.parsers.labbcat.LabbCatParser object at 0x7
f5f26464a90>                                                                                                        
[2019-07-11 10:32:02,932: WARNING/ForkPoolWorker-17] 3s6_72b_anon.TextGrid                                          
[2019-07-11 10:32:02,997: WARNING/ForkPoolWorker-17] Object class = "TextGrid"                                      
[2019-07-11 10:32:02,998: WARNING/ForkPoolWorker-17] class = "IntervalTier"                                         
[2019-07-11 10:32:02,998: WARNING/ForkPoolWorker-17] name = "Maisa"                                                 
[2019-07-11 10:32:03,003: WARNING/ForkPoolWorker-17] class = "IntervalTier"                                         
[2019-07-11 10:32:03,003: WARNING/ForkPoolWorker-17] name = "Ulfa"                                                  
[2019-07-11 10:32:03,006: WARNING/ForkPoolWorker-17] class = "IntervalTier"                                         
[2019-07-11 10:32:03,007: WARNING/ForkPoolWorker-17] name = "Romeeza"                                               
[2019-07-11 10:32:03,029: WARNING/ForkPoolWorker-17] Overlap for interval cause like we just left it at like a really good time: (291.235000, 294.508000)
[2019-07-11 10:32:03,029: WARNING/ForkPoolWorker-17] There was an issue parsing /projects/spade/repo/git/spade-Glaswasian/audio_and_transcripts/3s6_72b_anon.TextGrid:

So evidently the file 3s6_72b_anon.TextGrid causes an issue, though it's unclear why at this point.

Okay so I've looked into it, it seems that 43 out of 48 TextGrids are broken.
Here's a list

2s5s3_20c_FA_anon.TextGrid
2s6_43b_anon.TextGrid
2s6_44c_anon.TextGrid
2s5_21a_anon.TextGrid
3s6_72c_anon.TextGrid
3s5_69c_FA_anon.TextGrid
2s5_21c_anon.TextGrid
2s6_35b_anon.TextGrid
2s6_34b_anon.TextGrid
2s6_44b_anon.TextGrid
3s5_64c_FA.TextGrid
2s6_29a_anon.TextGrid
3s6_75a_FA_anon.TextGrid
2s6_34a_anon.TextGrid
3s5_51a_anon.TextGrid
3s5_64b_FA.TextGrid
2s6_37c_anon.TextGrid
3s5_57a.TextGrid
3s6_75c_FA_anon.TextGrid
3s5_69d_FA_anon.TextGrid
1s6s5_4_anon.TextGrid
3s5_57b_anon.TextGrid
2s6_29d_FA_anon.TextGrid
3s5_69b_FA_anon.TextGrid
3s6_72a_anon.TextGrid
3s6_72b_anon.TextGrid
2s6_29c_FA__anon.TextGrid
2s5_22b_FA_anon.TextGrid
2s5_21b_anon.TextGrid
3s5_55a_FA_anon.TextGrid
3s6_72d_anon.TextGrid
2s6_42aa_anon.TextGrid
2s6_35a_anon.TextGrid
3s6_75b_FA_anon.TextGrid
2s5s3_20a_FA_anon.TextGrid
1s6_1_anon.TextGrid
2s5_21e_anon.TextGrid
3s6_72f_anon.TextGrid
2s6_43c_FA_anonRF.TextGrid
2s5_22c_FA_anon.TextGrid
2s6_29b_anon.TextGrid
2s6_27a_anon.TextGrid

The issue with all of them is that there are multiple overlapping intervals in the TextGrids, so for some of these it's things like the same utterance repeated twice(with the same times), or two words which overlap. The files seem to open fine in Praat, but they don't open in textgrid.py which is why it can't be imported