Kozea/Pyphen

Import/loading takes a long time. How to speed up loading?

Wikinaut opened this issue · 8 comments

I use pyphen for my Rasperry Pi Zero powered Internetradio https://github.com/Wikinaut/pinetradio .

Import of pyphen started.
pyphen imported, loading of de_DE took 43.77 seconds on Raspberry Pi Zero

Loading always takes a very long time. Is there a way to decrease the loading time?

liZe commented

I use pyphen for my Rasperry Pi Zero powered Internetradio

Cool!

pyphen imported, loading of de_DE took 43.77 seconds on Raspberry Pi Zero

That’s a lot. Even if the Raspberry Pi’s CPU is slow, it shouldn’t take so much time. Profiling on my computer doesn’t give interesting results, could you please provide profiling information on your Raspberry? You can get profiling information launching python -c "import pyphen; pyphen.Pyphen(lang='de_DE')" -m cProfile -o /tmp/cprofile, and you can send the /tmp/cprofile file here.

(I hope I’ll be able to read it even if it’s no the same platform, otherwise I’ll ask you to launch an additional command!)

Done. The command did not work, but I put the commands into a file an run that. Here ist the full output:

(available until March 2024)
https://dpaste.com/7RA2RN2ES.txt

Here's is just one example of the usage (purpose of hyphenation: allow use of maxium font size on the tiny display of https://github.com/Wikinaut/pinetradio ). The first - and third - came from the hyphenation. Currently, I use only no or one automatic hyphenation per word.
grafik

liZe commented

You may get slightly better results using the 0.14.0 version, as it may be a bit faster if your storage is slow (and it probably is). That could help with the 17 seconds spent mainly to list dictionaries, and the 30 seconds in the __init__ function code when a dictionary is parsed.

But except from this change, you have almost the same distribution of time than me. It could be possible to find optimizations, but nothing’s obvious for me now.

0.14.0 is not much better:

Import of pyphen started.
pyphen imported, loading of de_DE took 39.62 seconds on Raspberry Pi Zero

new profile:

https://dpaste.com/CG5F52TKZ.txt

liZe commented

0.14.0 is not much better

A 10% improvement is good news, that’s what I was hoping for, but it’s not enough.

new profile:

It looks like we saved some time listing dictionaries, importing the module seems to be much faster.

For ~50s, there’s:

It should be possible to save some time, but there’s nothing obvious from what I see there :/.

Perhaps an offline-preprocessing of the (used) dictionary? Could this help?

liZe commented

Perhaps an offline-preprocessing of the (used) dictionary? Could this help?

I’ve tried to load a JSON (generated from the dictionary) on my laptop and it’s ~5 times faster than loading the Hunspell dictionary. With Pickle, it’s ~10 times faster. The benefits would be probably higher on slower systems.

We could consider including these pre-processed dictionaries. Pickle and JSON are probably not the best solutions (for different reasons), good ideas are welcome 😁.