premature EOF when reading DawgLexicon file
jlutgen opened this issue · 1 comments
jlutgen commented
My copy of EnglishWords.dat (a lexicon data file in the DAWG format) contains the byte 1A
at position 0x80 (fairly early in the file, in one of the first few edge structs). When using an istream's read()
to read data from a file opened in "text" mode on Windows, 1A
is treated as an end-of-file marker, unfortunately, so input.read((char*) edges, numBytes)
in DawgLexicon::readBinaryFile(std::istream& input)
does not read all numBytes
bytes. This leads to a segmentation fault in DawgLexicon::countDawgWords(Edge* ep)
.
My fix is to pass in the std::ios::binary
flag when opening files in Lexicon::addWordsFromFile(string &filename)
and in DawgLexicon::addWordsFromFile(string &filename)
.