/pysettrie

python3 package supporting efficient storage and querying of sets of sets using the trie data structure. Supports finding all the supersets/subsets of a given set from a collection of sets. Also includes a trie-based mapping container where the keys are sets.

Primary LanguagePythonGNU Lesser General Public License v3.0LGPL-3.0

pysettrie

Build Status

https://github.com/mmihaltz/pysettrie

pysettrie is a python3 package that provides support for efficient storage and querying of sets of sets using the trie data structure, supporting operations like finding all the supersets/subsets of a given set from a collection of sets. The original motivation for this module was to provide efficient search for supersets of sets of feature-value pairs in our natural language parser project (e.g. matching nouns against verb argument positions).

The following classes are included:

  • SetTrie: set-trie container for sets; supports efficient supersets/subsets of a given search set calculations.
  • SetTrieMap: mapping container using sets as keys; supports efficient operations like SetTrie but also stores values associated to the key sets.
  • SetTrieMultiMap: like SetTrieMap, but supports multiple values associated to each key.

For further information, please see documentation

Module test_settrie.py contains unittests for all the containers.

Author: Márton Miháltz https://sites.google.com/site/mmihaltz/

This package depends on the sortedcollection module. One recommended way to install (tested on Ubuntu):

sudo pip3 install sortedcontainers

If you don't have pip3:

sudo apt-get install python3-setuptools
sudo easy_install3 pip

pysettrie is partly based on: I.Savnik: Index data structure for fast subset and superset queries. CD-ARES, IFIP LNCS, 2013. http://osebje.famnit.upr.si/~savnik/papers/cdares13.pdf Remarks on paper:

  • Algorithm 1. does not mention to sort children (or do sorted insert) in insert operation (line 5)
  • Algorithm 4. is wrong, will always return false, line 7 should be: "for (each child of node labeled l: word.currentElement <= l) & (while not found) do"
  • the descriptions of getAllSubSets and getAllSuperSets operations are wrong, would not produce all sub/supersets

See also:

Changes:

  • Version 0.1.3:
    • SetTrieMultiMap.assign() returns number of values associated to key after assignment.

Licensed under the GNU LESSER GENERAL PUBLIC LICENSE, Version 3.