Import/loading takes a long time. How to speed up loading?
Wikinaut opened this issue · 8 comments
I use pyphen
for my Rasperry Pi Zero powered Internetradio https://github.com/Wikinaut/pinetradio .
Import of pyphen started.
pyphen imported, loading of de_DE took 43.77 seconds on Raspberry Pi Zero
Loading always takes a very long time. Is there a way to decrease the loading time?
I use
pyphen
for my Rasperry Pi Zero powered Internetradio
Cool!
pyphen imported, loading of de_DE took 43.77 seconds on Raspberry Pi Zero
That’s a lot. Even if the Raspberry Pi’s CPU is slow, it shouldn’t take so much time. Profiling on my computer doesn’t give interesting results, could you please provide profiling information on your Raspberry? You can get profiling information launching python -c "import pyphen; pyphen.Pyphen(lang='de_DE')" -m cProfile -o /tmp/cprofile
, and you can send the /tmp/cprofile
file here.
(I hope I’ll be able to read it even if it’s no the same platform, otherwise I’ll ask you to launch an additional command!)
Done. The command did not work, but I put the commands into a file an run that. Here ist the full output:
(available until March 2024)
https://dpaste.com/7RA2RN2ES.txt
Here's is just one example of the usage (purpose of hyphenation: allow use of maxium font size on the tiny display of https://github.com/Wikinaut/pinetradio ). The first -
and third -
came from the hyphenation. Currently, I use only no or one automatic hyphenation per word.
You may get slightly better results using the 0.14.0 version, as it may be a bit faster if your storage is slow (and it probably is). That could help with the 17 seconds spent mainly to list dictionaries, and the 30 seconds in the __init__
function code when a dictionary is parsed.
But except from this change, you have almost the same distribution of time than me. It could be possible to find optimizations, but nothing’s obvious for me now.
0.14.0
is not much better:
Import of pyphen started.
pyphen imported, loading of de_DE took 39.62 seconds on Raspberry Pi Zero
new profile:
0.14.0
is not much better
A 10% improvement is good news, that’s what I was hoping for, but it’s not enough.
new profile:
It looks like we saved some time listing dictionaries, importing the module seems to be much faster.
For ~50s, there’s:
- ~10s in
parse
(the regex matching), - ~10s elsewhere in the list comprehension (10s more than the 10s spent in the regex),
- ~30s elsewhere in the
HyphDict.__init__
method, including ~18s in CPython’s C code for Python primitives.
It should be possible to save some time, but there’s nothing obvious from what I see there :/.
Perhaps an offline-preprocessing of the (used) dictionary? Could this help?
Perhaps an offline-preprocessing of the (used) dictionary? Could this help?
I’ve tried to load a JSON (generated from the dictionary) on my laptop and it’s ~5 times faster than loading the Hunspell dictionary. With Pickle, it’s ~10 times faster. The benefits would be probably higher on slower systems.
We could consider including these pre-processed dictionaries. Pickle and JSON are probably not the best solutions (for different reasons), good ideas are welcome 😁.