FelipeFTN/Emoji-Copy

Update to Unicode 13.0.0 standard

NatVIII opened this issue ยท 8 comments

Current State
Currently emoji-copy's database is based off of Unicode 12.1.0 thanks to @helena-dev (๐Ÿณ๏ธโ€โšง๏ธ solidarity)

Desired Goal
By bringing emoji-copy into line with 13.0.0 standards we'll be able to then update it to 14.0.0 standards, and then bring it in line with 15.0.0 standards. My ultimate wish with this is setting achievable goals to work towards modernizing the available emojis.

I was wondering where these guys were (copied from emojipedia):
๐Ÿฅน
๐Ÿฅฒ

What needs to be done to update to Unicode 15.1?
Does a PR updating the following files with the same format but new emojis do?

emoji-copy@felipeftn/data/emojisCharacters.js
emoji-copy@felipeftn/data/emojisKeywords.js

I'm thinking maybe I could write a python parser taking this and spitting out the arrays needed for Emoji-Copy.

@NatVIII @FelipeFTN Could you have a look at my proposed solution?

Heyy @pavinjosdev!!
I'm sorry for the late response ๐Ÿ˜…
This is actually an amazing idea! Should work perfectly! ๐ŸŽ‰
I couldn't think of a better solution!

Building a Script that parses the latest Unicode emojis to our extensions to read should work nicely!
I'm very excited to see this working, @pavinjosdev ๐Ÿ‘€
Go on, feel free to open a Pull Request with your changes, I will take a careful look at it! ๐Ÿ’ฏ

I see two ways to achieve this:

  1. Build a script to update the emojiKeywords.js and emojiCharacters.js (in Javascript type file)
  2. Build a script that parses and update a Json file, and stop using Js files - this solution also needs to update the extension code to read from the emojis Json file.

Both should work fine! What do you think? Do you have any other solution, or may follow one of these?

Thank you so much for your contribution with this issue, @pavinjosdev!

@FelipeFTN Thank you for the update. I think option [2] using the JSON file is better as it's normally used for storing data and we can make the parser in any language that natively supports JSON. I will submit the PR soon ๐Ÿ™‚

Actually, I know it's quite late but I wanted to contribute that other more featureful datasets are available which also include needed keywords. One possible example is pulling from the muan/emojilib repo, with its json available here. There were some issues with that implementation though that we may find on implementing this as well which are

No categorization

  • The current js file relies on having the emotes be in little categories at the top of the screen and I can't figure out a way to have this consistently be available if we're re-creating the database every unicode update to make sure that it's compliant with the latest standard

ZWJ Integration

Just my two cents, sorry I was never able to get around to this so far!

@NatVIII The current parser as implemented by @FelipeFTN uses the categories from the unicode test file to automatically categorize emojis into groups and subgroups such as Smileys & Emotion, People & Body, etc. I believe ZWJ emojis are included in there.

The whole thing is saved as an SQLite DB by the python parser, which is queried by the extension's JS.
The blocker currently is w.r.t. using Gnome's libgda library and its corresponding SQLite binding causing gnome-shell to crash on OpenSuse Tumbleweed system running the latest of everything. It works on Fedora/Arch so I doubt it's an issue with the extension code itself. @FelipeFTN is awesome, he did all of the SQL work to make the queries fast ๐Ÿš€

๐Ÿ˜ฎ I had no idea that much work had gone in, as cool as this project is and as much as I use it daily I haven't really had a chance to delve into the code too much. Good work @FelipeFTN

Hahahha Thank you so much @NatVIII @pavinjosdev โค๏ธ
Actually @pavinjosdev did all the hard work hahaha!

The new feature is almost ready!
Let's keep working! ๐Ÿซ‚