Word predicion and correcting
elhennig opened this issue ยท 9 comments
Is there a possibility to have a predictive and corrective row on the keyboard or is there no chance for such an option?
The keyboard really lacks this feature but unfortunately, it's a huge amount of work and I don't plan to do it this year. Of course, contributions are welcome.
This keyboard is the best one ive tried so far for users who need precise control, this feature would make et perfect for everyday use. FlorisBoard is currently implementing this feature in their v4 branch, maybe we can get some inspiration? Im surprised there is no library to modularize this common need of all mobile keyboards.
/EDIT: Also OpenBoard already implements words correction and prediction and is the only opensource keyboard to my knowledge with this feature working reasonably well for production use, although much less effective than ai based prediction such as SwiftKey.
Ok so I did a little literature review (i am a machine learning scientist and i used to work on a few NLP models in the past). There are 3 interesting implementations, ranging from easiest to implement to hardest, but with inversely proportionate performance:
- for GBoard/OpenBoard like performance, i think a simple n-grams approach is sufficient. Itcs a very low effort implementation because there is an awesome tutorial and opensource implementation for Android in Kotlin that dates from 2019: https://proandroiddev.com/android-predictive-keyboard-e6c9df01e527 and https://github.com/mccorby/SmartKeyboardNgram-Android
- an evolution of this classical model is called the Stupid Backoff N-grams model, first published in 2007 and apparently what SwiftKey and other keyboards with great predictive word capabilities used. Original T Brants 2007 paper: https://aclanthology.org/D07-1090.pdf implementation in R which mentions it was done in partnership with SwiftKey: https://github.com/RenatoPdosSantos/word-predictor . The implementation is quite complex,it will likely be easiir to reimplement from scratch, but the R implementation can allow to check tricky implementation details. Worth noting this is an early LLM (Large Language Model), a predecessor of GPT.
- last but a technological leap would be to implement a local GPT model. There exists what are called Distilled GPT models that are much smaller and run locally in browser in JS and on mobile devices such as Android phones, but to my knowledge they are all based on gpt 3, not 3.5 nor 4 which are MUCH better. Once there are some and they work on most devices within a reasonable timeframe (ie, computing predictions within milliseconds and not seconds), then this will blow all other models out of the water. Currently there are no keyboard providing such capabilities so this option is an open one for the future, but it's worth mentioning because for sure in a few years with the tech maturing, gpt will be the state of the art for words correction and prediction on mobile.
Thanks for the info! The feature I have in mind would be word correction from several large dictionaries (several languages, custom dictionary, emoji names, etc..) with a deterministic output.
I would drop word prediction for any of determinism, fast queries or compact dictionaries.
Ah then a simple distance metric can do the trick, such as levenshtein distance? The issue is that different distance metrics will allow to correct different kinds of errors, some support deletions (ie, a missing character) whereas others only support replacement (ie, a character replaced by another, but the word has the same length). Also Iโm not sure about the algorithmic bound, usually they are nยฒ for learning and then n for inference (n being the size of the dictionary) but maybe there are newer algos I am not aware of. (/EDIT: ah maybe with trees, this may make inference in log(n) instead of n)
Also worth noting that n-grams based approaches are deterministic as long as you use the same input dictionary, and you get very fast inference and word prediction for free in addition to word correction.
I was wondering whether it is useful to develop a custom SpellCheckerService (the part that provides the actual suggestions) or just "plugging in" to the Android spell checker framework to begin with. For the first part it could already be helpful for many users to just use already existing SpellCheckerServices (e.g. AOSP or OpenBoard) in the background by following https://developer.android.com/develop/ui/views/touch-and-input/spell-checker-framework#SpellCheckClient. If that is not sufficient, maybe a custom spell checker could be developed afterwards. I don't know if I am missing something here, but this in theorey seems like a good and easy first approach to implement spell checking without much overhead. On the other side it seems like a custom user defined dictionary would have to be implemented by the client (here the keyboard) itself.
Of course all this requires thirdparty spell checkers to be installed on the system, but if that is not already the case, maybe it's a sacrifice users would make in order to get spell checking working. The TextServicesManager also provides functions to check if spell checkers are available. Maybe a dialog could be shown to inform users about the situation, however these functions have a pretty high API level.
There are samples available at the AOSP project at https://github.com/Miserlou/Android-SDK-Samples/tree/master/SpellChecker. Maybe its worth a shot.
@deftkHD Good idea! With that implemented, someone can focus on the dictionary later.
I have a Languagetool server on my NAS.
It would be nice If I could use a languagecorrection with that Server.
Maybe you can work for the language support together with https://anysoftkeyboard.github.io/ ?