Patch for distance.c: minor off-by-one error
Opened this issue · 0 comments
GoogleCodeExporter commented
This is really nitpicky but.... When populating the vocab array, distance.c
begins skipping characters after index max_w (having read 51 characters), but
it should have stopped after index max_w - 1. Consequently, the string
terminator for long strings is entered in the space reserved for the subsequent
string, and is overwritten when the next string is read in causing the two to
be mashed together.
For example, when searching for Cash_Flow given the current (as of 2015-06-15)
GoogleNews-vectors-negative300.bin, two results overflow the printf format
buffer, which is padded for strings up to length 50; indeed these two string do
not appear in the vocabulary, but are constructed when two vocabulary entries
-- a long one followed by a normal one -- are mashed together as described
above. After applying the attached patch the printf formatting looks fine as
only the first 50 characters of the long entries are printed.
Original issue reported on code.google.com by daniel.j...@gmail.com
on 17 Jun 2015 at 12:24
Attachments: