LonamiWebs/Telethon

Connection slow due to crypto

chstem opened this issue · 19 comments

As the connection to the Telegram servers is quite slow on ARM devices I did some profiling. As a test I download the icons for 4 Chats (located on 3 DC with a total size of about 250 KB). So Telethon needs to do some exporting of the connection to change the DC and download the files.

On my desktop CPU this takes about 10 seconds. But on ARM you can see how the crypto slows this down substantially, it took 90 seconds! Here are the first couple of lines from the profiling output:

         1165699 function calls (1165600 primitive calls) in 90.484 seconds

   Ordered by: internal time


   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    94278   29.987    0.000   29.987    0.000 /usr/lib/python3.4/site-packages/telethon/crypto/factorization.py:41(gcd)
        2   24.796   12.398   54.784   27.392 /usr/lib/python3.4/site-packages/telethon/crypto/factorization.py:5(find_small_multiplier_lopatin)
    16151   18.542    0.001   23.551    0.001 /home/nemo/.local/lib/python3.4/site-packages/pyaes/aes.py:237(decrypt)
       60    6.032    0.101    6.032    0.101 {built-in method sleep}
        6    2.503    0.417    2.503    0.417 {built-in method pow}
   213291    1.919    0.000    3.591    0.000 /usr/lib/python3.4/copy.py:67(copy)
       31    1.910    0.062   25.594    0.826 /usr/lib/python3.4/site-packages/telethon/crypto/aes.py:6(decrypt_ige)
   213291    1.299    0.000    1.299    0.000 /usr/lib/python3.4/copy.py:125(_copy_with_constructor)

So crypto/factorization.py and pyaes are the bottleneck here. Well surprise, factorizing numbers is computationally expensive. (What I don't get is the need for this, considering one usually exploits this computational cost of factorization to make crypto secure. But this seems to be how Telegram work.)

Of course this is not an issue on powerful x86 CPUs, but on ARM this is slow and drains battery.
Is there any way to make this more efficient? Maybe a faster factorization algorithm? One could consider using Cython, but this would require some building on every platform you want to use Telethon.

We could potentially use an existing prime factorization module if available, or fallback to the default if its not. sympy? pyprimes' factors method seems really slow compared to the current implementation.

This seems to be a lot faster, if you could test it and leave some feedback, or maybe more ideas before closing, that would be nice :P

Did not expect that sympy does factorization as well. My test now takes 56 seconds, definitively better!
I briefly tested pyprimes, but this is much slower.

So this makes pyaes the bottleneck now. Any ideas about that?

So this makes pyaes the bottleneck now.

More like my little bit of AES code, notice the percall time difference:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 16151   18.542    0.001   23.551    0.001 pyaes/aes.py:237(decrypt)
    31    1.910    0.062   25.594    0.826 telethon/crypto/aes.py:6(decrypt_ige)

I'm not sure if this strange AES-IGE mode is implemented on pyaes, or if another Python library supports it by default, that'd be a matter of looking for it just like I found sympy ^^

Slightly off topic, how did you get these timings?

How about interfacing to openssl?

import ctypes
libssl = ctypes.cdll.LoadLibrary('libssl.so')

On windows the library is probably called differently, so one might need to add some checks here.
Then one can use libssl.AES_set_encrypt_key() and libssl.AES_ige_encrypt() I could try hack something together, but maybe someone with more experience in C++/crypto would be better suited for this job.

I didn't know it was so easy to load external C libraries into Python, that's neat.

On windows the library is probably called differently

And it may not be installed by default too? Either way, we can just fallback to the current Python implementation if using the SSL library fails, in a similar fashion to sympy's approach.

I could try hack something together

That'd be nice ^^

More than that i tried uploading some big-a$$ files on t-servers, it was totally slow with telethon.! max-uploads-speed on telegram servers is almost 17 Mb, but telethon was doing just 1/16 of that speed and CPU -overload was 93% ! ...i think the problem is SSL-encryption's calculation which costs a lot! .....i did the same job with https://github.com/sochix/TLSharp ..........uploads perfectly with full-upload-speed and cpu-speed is nice-enough however it lacks on memory managing which in telethom case...it's awesome!

TLSharp encrypts with AES IGE just like Telethon. If you could @mojindri profile the time it takes like @feodoran did, that would help determine if the bottleneck when uploading also is the crypto part.

What bottleneck? He said TLSharp uploads with full speed. But the crypto in TLSharp is implemented in C#, so of course this is much faster then what pyaes can do. That is why I want to try and use openssl.

What bottleneck?

Nevermind, maybe I had messed up somewhere in the code and wasn't as good as it could be, but yes probably is just the language used.

@Lonami .....Can't u use open-ssl in project ?

Can't u use open-ssl in project ?

If you scroll up a little bit, that was my suggestion ;)

-Telethon up-speed
py-speed
-TLsharp-up-speed
tl-speed
-Tl-cpu and ram usages
tl-specs
Python-specs
py-specs
my tests @feodoran @Lonami

I managed to put something together which correctly de- and encrypt these test vectors, just as the current pure Python implementation does. But when I try to include this in Telethon I get aes_ige.c(88): OpenSSL internal error, assertion failed: (length % AES_BLOCK_SIZE) == 0 when trying to connect.

@Lonami Could you give me some typical test cases that Telethon needs to de/encrypt?

edit: Nevermind, I forgot the random padding if plain_text is not divisible by 16 already. Download speed looks very nice now. But factorizing is still somewhat slow. However, this is only done once for each DC, since the connection is cached, right?

Download speed looks very nice now.

Awesome job :) If you make a PR I'll be happy to both test and merge.

this is only done once for each DC, since the connection is cached, right?

Correct.

Awesome job :) If you make a PR I'll be happy to both test and merge.

I was just about to do this.

For reference, here is the new profile output:

         1762025 function calls (1761693 primitive calls) in 28.364 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       59    5.948    0.101    5.948    0.101 {built-in method sleep}
   322234    5.589    0.000    5.589    0.000 {built-in method pow}
    75508    5.223    0.000    5.503    0.000 /home/nemo/.local/lib/python3.4/site-packages/sympy/core/numbers.py:212(igcd2)
   100784    3.981    0.000   10.950    0.000 /home/nemo/.local/lib/python3.4/site-packages/sympy/core/numbers.py:160(igcd)
   302325    1.727    0.000    4.296    0.000 /home/nemo/.local/lib/python3.4/site-packages/sympy/ntheory/factor_.py:463(<lambda>)
        8    1.496    0.187   16.743    2.093 /home/nemo/.local/lib/python3.4/site-packages/sympy/ntheory/factor_.py:368(pollard_rho)
       31    0.952    0.031    0.954    0.031 /home/nemo/github/Telethon/telethon/crypto/libssl.py:45(decrypt_ige)
   176364    0.804    0.000    0.804    0.000 /home/nemo/.local/lib/python3.4/site-packages/sympy/core/compatibility.py:307(as_int)](url)

Anyone tried this here?

@feodoran good job it does seem quite a lot faster downloading files ^^'

Anyone tried this here?

No, I haven't, but:

if operation == "decrypt":

# [...]

elif operation == "encrypt":

Doesn't seem very efficient on a tight loop like that to begin with…

One more remark about Windows:
You need to install some OpenSSL for Windows first. The required library had a different name in my case (libcrypto-1_1-x64). But I don't know whether this is consistent among the OpenSSL implementations available for Windows.

However, find_library() did not find the file, unless you give its full name (which is kinda silly). So I don't know how to consistently implement this in libssl.py. If somebody knows more, it would be nice to include this. But one could still provide the name by hand to make it work.

So with libssl = ctypes.cdll.LoadLibrary('libcrypto-1_1-x64') it worked for me.