Problems inserting big positions?
andreasbaumann opened this issue · 2 comments
andreasbaumann commented
DEBUG: field 482:11'html_meta_file': './data/etext/10556.html', @192151315
DEBUG: lookup expression for field 'html_meta_file'
DEBUG: got expression number for 'html_meta_file' to be 0
token positions of document '1055610556' are out or range (document too big, 150199 token positions assigned)
DEBUG: buffer reset, rest: 1055710557 Brooke, L. Leslie (Leonard Leslie), 1862-1940 Johnny Crow's Party English PZ: Language and Literatures:
failed to process document 'gutenberg.tsv': failed to process document 'gutenberg.tsv': error closing document in transaction: corrupt data (unpackInt32_ 1)
done
The positions of the experimental TSV segmenter with ZIP-file @zipinclude function are quite big,
because it's basically the position within the TSV file and the position of the file withing the
uncompressed ZIP stream.
See
https://github.com/andreasbaumann/strusExamples/tree/master/gutenberg
and
https://github.com/andreasbaumann/strusAnalyzer/tree/tsv_extensions
andreasbaumann commented
I'm actually never resetting the position in the segmenter. Is it possible to reset it to 0 when we
start a new document section?
andreasbaumann commented
For reference the gdb stacktrace:
(gdb) bt
#0 __cxxabiv1::__cxa_throw (obj=obj@entry=0x7fffb7376600,
tinfo=0x7ffff6260ab0 <typeinfo for std::runtime_error>,
dest=0x7ffff5f8b050 <std::runtime_error::~runtime_error()>)
at /build/gcc-multilib/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:62
#1 0x00007ffff51b8d4f in unpackInt32_ (end=end@entry=0x7fffbf44d591 "",
itr=@0x7fffe7ffc760: 0x7fffbf44d590 "\335")
at /home/abaumann/strus/strus/src/storage/indexPacker.cpp:32
#2 strus::unpackIndex (itr=@0x7fffe7ffc760: 0x7fffbf44d590 "\335",
end=end@entry=0x7fffbf44d591 "")
at /home/abaumann/strus/strus/src/storage/indexPacker.cpp:81
#3 0x00007ffff51b3590 in strus::ForwardIndexBlock::position_at (
ref=0x7fffbf44d590 "\335", this=0x7fffe7ffc840)
at /home/abaumann/strus/strus/src/storage/forwardIndexBlock.cpp:22
#4 strus::ForwardIndexBlock::append (this=this@entry=0x7fffe7ffc840,
pos=@0x7fffe0197f90: 4072, item="\225")
at /home/abaumann/strus/strus/src/storage/forwardIndexBlock.cpp:71
#5 0x00007ffff51b3f1b in strus::ForwardIndexMap::closeCurblock (
this=this@entry=0x7fffe0000bf0, typeno=@0x7fffbd78f370: 5,
elemlist=std::vector of length 128, capacity 128 = {...})
at /home/abaumann/strus/strus/src/storage/forwardIndexMap.cpp:29
#6 0x00007ffff51b5206 in strus::ForwardIndexMap::defineForwardIndexTerm (
this=0x7fffe0000bf0, typeno=@0x7fffbd78f370: 5,
typeno@entry=@0x7fffbd78f370: <optimized out>, pos=@0x7fffbd78f374: 4175,
---Type <return> to continue, or q <return> to quit---
pos@entry=@0x7fffbd78f374: <optimized out>, termstring="\273")
at /home/abaumann/strus/strus/src/storage/forwardIndexMap.cpp:159
#7 0x00007ffff519e6ec in strus::StorageTransaction::defineForwardIndexTerm (
this=<optimized out>, typeno=@0x7fffbd78f370: 5,
pos=@0x7fffbd78f374: 4175, termstring="\273")
at /home/abaumann/strus/strus/src/storage/storageTransaction.cpp:164
#8 0x00007ffff51d60d9 in strus::StorageDocument::done (this=0x7fffe0de1980)
at /home/abaumann/strus/strus/src/storage/storageDocument.cpp:157
#9 0x00000000004206c2 in strus::InsertProcessor::run (this=0x65e8a0)
at /home/abaumann/strus/strusUtilities/src/strusInsert/insertProcessor.cpp:230
#10 0x00007ffff6cb098d in ?? () from /usr/lib/libboost_thread.so.1.63.0
#11 0x00007ffff57a52e7 in start_thread () from /usr/lib/libpthread.so.0
#12 0x00007ffff54e654f in clone () from /usr/lib/libc.so.6
and
(gdb)
#7 0x00007ffff519e6ec in strus::StorageTransaction::defineForwardIndexTerm (this=<optimized out>, typeno=@0x7fffbd78f370: 5,
pos=@0x7fffbd78f374: 4175, termstring="\273") at /home/abaumann/strus/strus/src/storage/storageTransaction.cpp:164
164 m_forwardIndexMap.defineForwardIndexTerm( typeno, pos, termstring);
(gdb) p typeno
$3 = (const strus::Index &) @0x7fffbd78f370: 5
(gdb) p pos
$4 = (const strus::Index &) @0x7fffbd78f374: 4175
(gdb) p termstring
$5 = "\273"