Update to Unicode 13.0.0 standard
NatVIII opened this issue ยท 8 comments
Current State
Currently emoji-copy's database is based off of Unicode 12.1.0 thanks to @helena-dev (๐ณ๏ธโโง๏ธ solidarity)
Desired Goal
By bringing emoji-copy into line with 13.0.0 standards we'll be able to then update it to 14.0.0 standards, and then bring it in line with 15.0.0 standards. My ultimate wish with this is setting achievable goals to work towards modernizing the available emojis.
I was wondering where these guys were (copied from emojipedia):
๐ฅน
๐ฅฒ
What needs to be done to update to Unicode 15.1?
Does a PR updating the following files with the same format but new emojis do?
emoji-copy@felipeftn/data/emojisCharacters.js
emoji-copy@felipeftn/data/emojisKeywords.js
I'm thinking maybe I could write a python parser taking this and spitting out the arrays needed for Emoji-Copy.
@NatVIII @FelipeFTN Could you have a look at my proposed solution?
Heyy @pavinjosdev!!
I'm sorry for the late response ๐
This is actually an amazing idea! Should work perfectly! ๐
I couldn't think of a better solution!
Building a Script that parses the latest Unicode emojis to our extensions to read should work nicely!
I'm very excited to see this working, @pavinjosdev ๐
Go on, feel free to open a Pull Request with your changes, I will take a careful look at it! ๐ฏ
I see two ways to achieve this:
- Build a script to update the emojiKeywords.js and emojiCharacters.js (in Javascript type file)
- Build a script that parses and update a Json file, and stop using Js files - this solution also needs to update the extension code to read from the emojis Json file.
Both should work fine! What do you think? Do you have any other solution, or may follow one of these?
Thank you so much for your contribution with this issue, @pavinjosdev!
@FelipeFTN Thank you for the update. I think option [2] using the JSON file is better as it's normally used for storing data and we can make the parser in any language that natively supports JSON. I will submit the PR soon ๐
Actually, I know it's quite late but I wanted to contribute that other more featureful datasets are available which also include needed keywords. One possible example is pulling from the muan/emojilib repo, with its json available here. There were some issues with that implementation though that we may find on implementing this as well which are
No categorization
- The current js file relies on having the emotes be in little categories at the top of the screen and I can't figure out a way to have this consistently be available if we're re-creating the database every unicode update to make sure that it's compliant with the latest standard
ZWJ Integration
- I don't see any way in the code that we can use ZWJ sequences yet. They're more important in emojis implemented all the way up into 15.1 and I just never got a chance to figure that out.
Just my two cents, sorry I was never able to get around to this so far!
@NatVIII The current parser as implemented by @FelipeFTN uses the categories from the unicode test file to automatically categorize emojis into groups and subgroups such as Smileys & Emotion, People & Body, etc. I believe ZWJ emojis are included in there.
The whole thing is saved as an SQLite DB by the python parser, which is queried by the extension's JS.
The blocker currently is w.r.t. using Gnome's libgda
library and its corresponding SQLite binding causing gnome-shell to crash on OpenSuse Tumbleweed system running the latest of everything. It works on Fedora/Arch so I doubt it's an issue with the extension code itself. @FelipeFTN is awesome, he did all of the SQL work to make the queries fast ๐
๐ฎ I had no idea that much work had gone in, as cool as this project is and as much as I use it daily I haven't really had a chance to delve into the code too much. Good work @FelipeFTN
Hahahha Thank you so much @NatVIII @pavinjosdev โค๏ธ
Actually @pavinjosdev did all the hard work hahaha!
The new feature is almost ready!
Let's keep working! ๐ซ