Hash table possibility instead a list
mtmr0x opened this issue · 8 comments
Premisse
For performance case and better scale of glossary, hash tables based information makes the key<->value match faster and easier to scale in matter of matching data through endpoints.
Case of today
GET
and endpoint returns a list of objects for the dictionary, if I want to list them all, that's fine, if I need to match one specific word I have to run through the list looking for it.
Use of hash table
If data could be organised like this:
{
barril: { dialect: 'barril', meanings: [ ... ], examples: [ ... ] },
migue: { dialect, 'Migué', meanings: [ ... ], examples: [ ... ] }
}
If I want to understand some word, I would just do:
const barril = baianes['barril'];
That way the key<->value matching would be faster and easier to find a word inside the dictionary.
Other "wins" inside this decision
Organising .json
files would be way easier, adding new words either. Instead of having a full file of it, you can place everything in a folder like:
dialects
\_ baianes
\_ barril.json
\_ migue.json
...etc
- The root endpoint would get everything as a collection and display it in a hash table structure in order to show the full collection;
- And now you can easily provide a deeper "foldering" endpoint to get specific words, like:
GET https://dialetus-service.now.sh/dialects/baianes/barril-dobrado
response example:
{
barrilDobrado: {
"dialect": "Barril Dobrado",
"meanings": [
"Problema muito grande",
"Situação muito complicada",
"Pessoa de grande Qualiadde"
],
"examples": [
"Isso ai é barril vey",
"Você é barril dobrado meu pivete",
"Eu sou barril dobrado"
]
},
}
I'll work on that, but I'm kinda slow these times working in something quite complex. I will start writing the implementation and tech specs for it. 🎉
Hi there :)
When you're dealing with text search, it is more interesting to have not a hash table, but a prefix tree or a ngram tree, so you can execute partial searches on terms.
A trie is fairly simple to implement with only vanilla libs, but specializing it to ngrams may need some external libs or way more work.
Some inspirational examples:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html
https://whoosh.readthedocs.io/en/latest/ngrams.html
One other thing is in regards to Data/Search index model. In most search engines, you have a clear separation between the data you store and the search indexes you use. So a good thing on this idea is to turn the Data model into a K/V store, but keep an index for it in another data structure, so that the data is easily addressable via Key, but searchable via a more elaborate search index :)
@mateusduboli You're absolutely right about using trees for search purposes, I was being simplistic in my solution and didn't consider a better searchable solution for looking for words. 🤦♂
Instead of having a user readable JSON file, we would design some logic to retrieve data from searched characters. It makes sense to me. Is this aligned with the project long run expectations @mvfsillva?
I found it super interesting, I did not know the ngram tree, I think it completely aligns with the expectation of the project.
If I understand this correctly it will improve the performance to look for words and also in the semantics of the data storage
Hey, guys, @mtmr0x @mateusduboli let's do it \0/