/ConnectedTrie

A data structure for efficient matching of two strings to the longest pair of prefixes

Primary LanguageC#

ConnectedTrie

A data structure for efficient matching of two strings to the longest pair of prefixes

This data structure is based on the trie or prefix tree. It was created to facilitate a lookup to match a pair of strings to the longest pair of prefixes. The data structure consists of two tries, the A-Trie (orange) and B-Trie (blue), with linkages between the leaf notes. The leaf nodes of the B-Trie and the A-Trie are connected by sharing the same entity. Entities are generic objects of type T and can store any desired data type. Additionally, entities contain an A-string and a B-String which specifies where in the A-Trie and B-Trie, respectively, the entity is stored.

When checking if a particular pair of strings match to an entity we traverse the B-Trie to find the entity with the longest matching B-string and then “climb” back up the A-Trie to reach the root node. If we are successful, then we have matched the longest B-string and the longest A-string and the entity is returned. If not we traverse back up the B-Trie, and at each entity we find, we again try to “climb” back up the A-Trie to reach the root node. If we never get to the A-Trie root node no matches are found.

image

Above is an image that will demonstrate the lookup process. Given the entities shown in the image, and given an A-string of “cantxyz” and a B-string of “dontxyz” we would traverse the B-Trie down four nodes, “d”, “o”, ”n” and “t”. We cannot descend any further, so we move to the connected leaf node in the A-Trie and try to “climb” to the root node. The first check will fail as the node contains the character “d” and the 5th character in the A-string “cantxyz” is “x”. We are comparing the 5th character as the “d” node is 5 deep in the A-Trie and therefore should be matched to the 5th character. As we did not make a match with the “could-dont” entity we move back up the B-Trie from the current node “t” to ”n” and then “o”. “o” is a leaf node as it has an entity, so we move to the connected leaf node in the A-Trie and try to “climb” to the root node. This time the climb from “t” to “n” to “a” and finally to “c” is successful. And we consider A-String of “cantxyz” and a B-string of “dontxyz” as matched to the “cant:do” entity.