Import/loading takes a long time. How to speed up loading?

Question

Import/loading takes a long time. How to speed up loading?

Wikinaut opened this issue 2 years ago · 8 comments

I use pyphen for my Rasperry Pi Zero powered Internetradio https://github.com/Wikinaut/pinetradio .

Import of pyphen started.
pyphen imported, loading of de_DE took 43.77 seconds on Raspberry Pi Zero

Loading always takes a very long time. Is there a way to decrease the loading time?

Answer 1 · 2023-03-22T08:32:26.000Z

I use pyphen for my Rasperry Pi Zero powered Internetradio

Cool!

pyphen imported, loading of de_DE took 43.77 seconds on Raspberry Pi Zero

That’s a lot. Even if the Raspberry Pi’s CPU is slow, it shouldn’t take so much time. Profiling on my computer doesn’t give interesting results, could you please provide profiling information on your Raspberry? You can get profiling information launching python -c "import pyphen; pyphen.Pyphen(lang='de_DE')" -m cProfile -o /tmp/cprofile, and you can send the /tmp/cprofile file here.

(I hope I’ll be able to read it even if it’s no the same platform, otherwise I’ll ask you to launch an additional command!)

Answer 2 · 2023-03-22T09:29:37.000Z

Done. The command did not work, but I put the commands into a file an run that. Here ist the full output:

(available until March 2024)
https://dpaste.com/7RA2RN2ES.txt

Answer 3 · 2023-03-22T09:41:19.000Z

Here's is just one example of the usage (purpose of hyphenation: allow use of maxium font size on the tiny display of https://github.com/Wikinaut/pinetradio ). The first - and third - came from the hyphenation. Currently, I use only no or one automatic hyphenation per word.

Answer 4 · 2023-03-22T13:18:26.000Z

You may get slightly better results using the 0.14.0 version, as it may be a bit faster if your storage is slow (and it probably is). That could help with the 17 seconds spent mainly to list dictionaries, and the 30 seconds in the __init__ function code when a dictionary is parsed.

But except from this change, you have almost the same distribution of time than me. It could be possible to find optimizations, but nothing’s obvious for me now.

Answer 5 · 2023-03-22T17:40:54.000Z

0.14.0 is not much better:

Import of pyphen started.
pyphen imported, loading of de_DE took 39.62 seconds on Raspberry Pi Zero

new profile:

https://dpaste.com/CG5F52TKZ.txt

Answer 6 · 2023-03-23T14:30:30.000Z

0.14.0 is not much better

A 10% improvement is good news, that’s what I was hoping for, but it’s not enough.

new profile:

It looks like we saved some time listing dictionaries, importing the module seems to be much faster.

For ~50s, there’s:

~10s in parse (the regex matching),
~10s elsewhere in the list comprehension (10s more than the 10s spent in the regex),
~30s elsewhere in the HyphDict.__init__ method, including ~18s in CPython’s C code for Python primitives.

It should be possible to save some time, but there’s nothing obvious from what I see there :/.

Answer 7 · 2023-03-23T15:17:07.000Z

Perhaps an offline-preprocessing of the (used) dictionary? Could this help?

Answer 8 · 2023-03-25T17:21:48.000Z

Perhaps an offline-preprocessing of the (used) dictionary? Could this help?

I’ve tried to load a JSON (generated from the dictionary) on my laptop and it’s ~5 times faster than loading the Hunspell dictionary. With Pickle, it’s ~10 times faster. The benefits would be probably higher on slower systems.

We could consider including these pre-processed dictionaries. Pickle and JSON are probably not the best solutions (for different reasons), good ideas are welcome 😁.