Segfault on FoLiA in to FoLiA out (speech data with events and utterances)
proycon opened this issue · 7 comments
Frog (libfolia) segfaults on the attached FoLiA input upon FoLiA serialisation.
<?xml version="1.0" encoding="utf-8"?>
<FoLiA xmlns="http://ilk.uvt.nl/folia" version="2.5" xml:id="example">
<metadata>
<annotations>
<text-annotation>
<annotator processor="p1" />
</text-annotation>
<utterance-annotation>
<annotator processor="p1" />
</utterance-annotation>
<event-annotation set="speech">
<annotator processor="p1" />
</event-annotation>
</annotations>
<provenance>
<processor xml:id="p1" name="proycon" type="manual" />
</provenance>
</metadata>
<text xml:id="example.speech">
<event xml:id="turn.1" class="turn" src="piet.wav" begintime="00:00:00.720" endtime="00:00:53.230">
<utt xml:id="example.utt.1" speaker="Piet">
<t>Het is vandaag 1 januari 2019. Mijn naam is Piet voor het project Diplomatieke Getuigenissen heb ik vandaag een gesprek met Piet. Ook met ons in de kamer is Piet die voor ons het geluid en de video verzorgt. Meneer Piet misschien dat we gewoon kunnen beginnen met dat u iets over uw opleiding vertelt en hoe u bij Buitenlandse Zaken bent komen te werken?</t>
</utt>
<utt xml:id="example.utt.2" speaker="Piet">
<t>Ja ik ben geboren in 1936. Volgens de boeken het heilige jaar voor de Chinezen. 1936. In 2036 is er weer zo'n heilig jaar. Ik ben ... </t>
</utt>
</event>
</text>
</FoLiA>
Call: frog --skip=pac -x anon_1.folia.xml -X anon_1.out.folia.xml
All actual processing goes fine, it is the FoLiA serialisation in the end that fails.
gdb backtrace:
Thread 1 "frog" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007fa4eae08999 in folia::AbstractElement::append (this=<optimized out>, this@entry=0x7fa4e700a580, child=<optimized out>, child@entry=0x7fa4e659a7f0) at folia_impl.cxx:3129
#2 0x00007fa4eae98ee2 in folia::AbstractStructureElement::append (this=0x7fa4e700a580, child=0x7fa4e659a7f0) at folia_subclasses.cxx:784
#3 0x00007fa4eae306fc in folia::AbstractElement::AbstractElement (this=this@entry=0x7fa4e659a7f0, __vtt_parm=__vtt_parm@entry=0x7fa4eb5abfc0 <VTT for folia::Paragraph+16>, p=..., el=el@entry=0x7fa4e700a580, __in_chrg=<optimized out>) at folia_impl.cxx:293
#4 0x00007fa4eb4cd949 in folia::AbstractStructureElement::AbstractStructureElement (p=0x7fa4e700a580, props=..., __vtt_parm=0x7fa4eb5abfb8 <VTT for folia::Paragraph+8>, this=0x7fa4e659a7f0, __in_chrg=<optimized out>)
at /usr/local/include/libfolia/folia_subclasses.h:59
#5 folia::Paragraph::Paragraph (p=0x7fa4e700a580, a=..., this=0x7fa4e659a7f0, __in_chrg=<optimized out>, __vtt_parm=<optimized out>) at /usr/local/include/libfolia/folia_subclasses.h:626
#6 folia::FoliaElement::add_child<folia::Paragraph> (args=..., this=0x7fa4e700a580) at /usr/local/include/libfolia/folia_impl.h:125
#7 FrogAPI::handle_one_text_parent (this=0x7ffc1bc9e600, os=..., e=0x7fa4e700a580, sentence_done=<optimized out>) at FrogAPI.cxx:2567
#8 0x00007fa4eb4ce462 in FrogAPI::run_folia_engine (this=0x7ffc1bc9e600, infilename=..., output_stream=...) at FrogAPI.cxx:2661
#9 0x00007fa4eb4d0bf1 in FrogAPI::FrogFile (this=0x7ffc1bc9e600, infilename=...) at FrogAPI.cxx:2743
#10 0x00007fa4eb4d3cbd in FrogAPI::run_on_files (this=0x7ffc1bc9e600) at FrogAPI.cxx:1175
#11 0x000055c8b0feafd2 in main (argc=<optimized out>, argv=<optimized out>) at Frog.cxx:229
frog_segfault (END)
Well, a quick analyse showed me that Frog creates a paragraph and then attempts to append that to the <utt>
This is forbidden (folia_properties.cxx)
.
The append will throw in libfolia, but then the exception is not handled correctly.
Needs more investigation.
Bottomline: Do we want <p>
nodes in an <utt>
?
No, we don't, just sentences and words.... (ucto seems to do it properly)
Ok, but in this example the <t>
in the <utt>
contains more then one sentence. Ergo a PARAGRAPH.
Not sure why ucto delivers only a sequence of <s>
and NO paragraph
So I added a small fix to libfolia. Now the exception is handled correctly.
Leaving the problem of creating an unwanted <p>
in Frog
a small fix is now in Git. seems to work
Thanks! That fixes it indeed. Are things ready enough for a new release? I see you've been hacking some more lately.
assuming it is done