Repeated keys
hulikau opened this issue · 2 comments
import marisa_trie
a = [(u'1', '1'), (u'1', '2')]
tr = marisa_trie.BytesTrie(a)
print tr.keys()
This will output [u'1', u'1'].
I guess, that, this function should returns a [u'1']
I'm ready to fix it, if someone consider, that this is bug.
Woah, looks like a bug to me. Interestingly, it is not present in marisa_trie.Trie
:
>>> marisa_trie.Trie(["foo", "foo"]).keys()
['foo']
>>> marisa_trie.BytesTrie([("foo", b"1"), ("foo", b"2")]).keys()
['foo', 'foo']
@superbobry
In the same time in tests this situation is expectable: https://github.com/pytries/marisa-trie/blob/0.7.5/tests/test_bytes_trie.py#L106.
By the way I understood why this happens:
For Trie class everything is perfect. For BytesTrie and others inherited from him, in the constructor pairs (key, value)
transform to new_key = key + separator + value
. Inside the methods keys and iterkeys you iterate over the set of new_keys and then cut old key back.
for i in range(0, ag.key().length()):
if raw_key[i] == self._c_value_separator:
key = raw_key[:i].decode('utf8')
And then you add the same key to the res
several times.