psi-im/plugins

[Windows x64] Crash when OTR "Generate new key" or "Start private conversation"

Neustradamus opened this issue · 36 comments

Since a long time, there is a problem with OTR on Windows x64.

It is impossible to:

  • "Generate new key"
  • "Start private conversation"

No problem on Windows x86.

It is linked to libotr/libgcrypt, it is possible to have help?
@tehnick has tried to debug it without success.

@jeroen, @jkivilin, @dd9jn, @dkg, @dtzWill, @mstorsjo, @smuellerDD
@arlolra, @dgoulet
@claucece @DrWhax @juniorz

Thanks in advance

Mmm.. interesting. I don't use Windows.. but if I find a way to reproduce this.. I'll let you know :)

I don't use Windows.. but if I find a way to reproduce this..

Personally I prefer to debug MS Windows related bugs in WINE first.
And this bug is 100% reproducible in 64-bit version of WINE.

Some additional info:

  • By default all libraries for 64-bit Window in my build environment are built using MinGW with posix threads and seh exception handling. In this case this bug is 100% reproducible.
  • I have tried to rebuild all using sjlj exception handling and have faced with completely another bug: generation of new key never ends. But in this case I am not sure that problem is not in our plugin. (We use QtConcurrentRun for calling of related functions from libotr.)

I'm not familiar with Psi, is otr plugin packaged with the psi bundle(https://psi-im.org/download/)?

Are you using mingw-w64?

is otr plugin packaged with the psi bundle

Yes.

The most recent builds of Psi+ are here. From time to time I make Psi IM builds from master branch and upload them here. Difference between Psi+ and Psi IM and our development model are documented here.

Are you using mingw-w64?

Yes. I use MXE + Sibuserv projects for cross-compilation for MS Windows. Build scripts are here.

But I do not think that there is need in debugging of full Psi+ program + OTR plugin: we may prepare more simple example for reproducing of the bug based on small piece of code from plugin.

BTW, our builds using MSVC had similar bug as far as I remember. But they were prepared and tested by other people, so I am not sure.

@tehnick Thanks for this! I'll check it out tomorrow.

BTW, there is a weird ASLR bug (with a workaround its fixable!) with mingw-w64 that renders ASLR.. useless: https://insights.sei.cmu.edu/cert/2018/08/when-aslr-is-not-really-aslr---the-case-of-incorrect-assumptions-and-bad-defaults.html

But that might need a seperate ticket..

@DrWhax Thanks for info. I have added workaround proposed in this article. This will be enough I hope.

@DrWhax: Have you checked?
Have you detected the problem?
What is your point of view?

Hey @tehnick !

I saw that you have a workaround.. is that enough?
It seems like this is mostly a libgrcrypt problem. If you try to install that library, does it succeed?

Thanks!

I saw that you have a workaround.. is that enough?

What workaround are you talking about?

It seems like this is mostly a libgrcrypt problem.

We not sure if this a libotr or libgrcrypt problem, because libotr may use libgrcrypt in a wrong way.

If you try to install that library, does it succeed?

Sorry, I didn't understand this. Could you explain what do you mead in more details?

I saw that you have a workaround.. is that enough?

If you are talking about #45 (comment) that was completely another issue, not related to OTR. Sorry for off-topic comments here.

Hey,

What workaround are you talking about?

This one: "I have added workaround proposed in this article. This will be enough I hope."

We not sure if this a libotr or libgrcrypt problem, because libotr may use libgrcrypt in a wrong way.

Ok, let me clarify, there is still a crash when generating new keys or starting a conversation, right? In windows64, right?

If you are talking about #45 (comment) that was completely another issue, not related to OTR. Sorry for off-topic comments here.

No problem.

If you have some debug logs of when the crash happens, paste them here. It can be a problem within threads of the UI and the generation of the keys.

Thanks!

Ok, so I have looked at the code. For me, this can be a problem with threads. I see that on the main thread 'otrl_privkey_generate_start' and 'otrl_privkey_generate_finish' are called. In the back, 'otrl_privkey_generate_calculate' is called. Or at least that it how it should behave, according to the documentation in libotr. if this is related to libgcrypt, the place where it can fail is: https://github.com/psi-im/plugins/blob/master/generic/otrplugin/src/otrinternal.cpp#L851, as 'otrl_privkey_generate_calculate' is the only one that calles libgcrypt funcs. What can be tried is:

@tehnick if you can send me the log of when it crashes, it will be super useful.

Thanks!

Ok, let me clarify, there is still a crash when generating new keys or starting a conversation, right?

When interlocutor tries to start OTR session with you and you do not have a key for current account special dialog window will be opened with suggestion to generate a key. So the exactly the same code is called in both cases.

In windows64, right?

Yes, the problem is reproducible only in 64-bit MS Windows executables. And possibly only in MinGW builds.

If you have some debug logs of when the crash happens, paste them here. It can be a problem within threads of the UI and the generation of the keys.

Unfortunately currently I have debug build of program with debug symbols only for Psi+ and its plugins, but not for libraries which it uses. So the backtrace is not enough imformative. You may look on it here.

Unfortunately currently I have debug build of program with debug symbols only for Psi+ and its plugins, but not for libraries which it uses. So the backtrace is not enough imformative. You may look on it here.

It is still very useful. I'll check if something is strange there.

Thanks!

Ok. I have reviewed the log. The important parts are:

No symbol table info available.
#6  0x000000001eee130f in OtrInternal::create_privkey (
    this=this@entry=0x1ba6ee60,
    accountname=0x1c000708 "{4d70ff68-ac36-4c21-9872-448d352d1e19}",
    protocol=protocol@entry=0x1eef68a0 "prpl-jabber")
    at /home/boradmin/Tmp/Psi+/psi-plus-snapshots/src/plugins/generic/otrplugin/
src/otrinternal.cpp:854
        qMB = <incomplete type>
        newkeyp = 0x1c0294e0
        loop = <incomplete type>
        watcher = {<QFutureWatcherBase> = {<No data fields>}, m_future = {
            d = {<QFutureInterfaceBase> = {<No data fields>}, <No data fields>}}
}
        future = {
          d = {<QFutureInterfaceBase> = {<No data fields>}, <No data fields>}}
        fingerprint = "\020}T\033\000\000\000\000Т\020wf\000\000\000\000\bnX\033
\000\000\000\000'ьvf\000\000\000\000\000\001\000\000\000\000\000\000>cШ\025"
#7  0x000000001eee24ff in OtrInternal::generateKey (this=0x1ba6ee60,
    account=...)
    at /home/boradmin/Tmp/Psi+/psi-plus-snapshots/src/plugins/generic/otrplugin/
src/otrinternal.cpp:773

It fails at loop.exec();. Unsure why; but this makes me think it is not a problem from libgcrypt or libotr.

Sorry for delay with reply: I was busy with other tasks in the project.

It fails at loop.exec();. Unsure why; but this makes me think it is not a problem from libgcrypt or libotr.

Sorry, but your assumption is wrong. Probably you do not know how QtConcurrent works...

This is quite simple layer which allows to call a functions in a separate thread. Race conditions and other thread specific errors are excluded, because all functions in that thread are called in series according to the pool of "events" (functions and their arguments) prepared before start of that thread.

Debug note about loop.exec(); means exactly that a separate thread was created and QtConcurrent began to call functions from his event loop.

Segfault had happened in that new thread. The most important debug notes are:

[Switching to Thread 2944.0x554]
_gcry_mpih_lshift () at mpih-lshift-asm.S:46
46      mpih-lshift-asm.S: No such file or directory.

(gdb) thr a all bt full

Thread 27 (Thread 2944.0x554):
#0  _gcry_mpih_lshift () at mpih-lshift-asm.S:46
No locals.
#1  0x000000001bf6d7e0 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

I have prepared a minimal reproducible example for you as promised:
https://github.com/tehnick/example-of-libotr-crash

BTW it is written in plain C...

Here are statically linked 32- and 64-bit executables for MS Windows:
https://tehnick.net/tmp/test-libotr-1.0_x64.exe
https://tehnick.net/tmp/test-libotr-1.0.exe
Unfortunately, they are still without debug symbols...

gdb log for test-libotr-1.0.exe: https://susepaste.org/view/raw/47951425
gdb log for test-libotr-1.0_x64.exe: https://susepaste.org/view/raw/85746535

All executables are built using MXE project.

Hey!

Sorry, but your assumption is wrong. Probably you do not know how QtConcurrent works...

Sure. I'm not a developer of this project or C++.

As the debug says:

[Switching to Thread 2944.0x554]
_gcry_mpih_lshift () at mpih-lshift-asm.S:46
46      mpih-lshift-asm.S: No such file or directory.

You are missing the mpih-lshift-asm.S file. Can you check if:

  1. It exists in the libgrcrypt library version you are using: https://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=blob;f=mpi/amd64/mpih-lshift.S;h=9e8979b1060393842c0505e338f5ef7305dc51ce;hb=refs/heads/master
  2. Is been linked correctly: something like config.status: linking mpi/amd64/mpih-lshift.S to mpi/mpih-lshift-asm.S.

As libotr does not come with libgcrypt, you should be installing it to build it. Maybe you just need to upgrade the version for libgcrypt. Or if not, submit an issue to the MINGW-packages: https://github.com/msys2/MINGW-packages/issues or libgcrypt: https://dev.gnupg.org/

That is why I said:

It seems like this is mostly a libgrcrypt problem. If you try to install that library, does it succeed?

Have you try on only builiding libcrypt (or libotr) and checking if the tests pass? Have you try building them from source?

We not sure if this a libotr or libgrcrypt problem, because libotr may use libgrcrypt in a wrong way.

If libotr can't find a file, then this looks like a libgcrypt linking problem.

And, if you manage to install libgrcrypt and build it, can you paste the build output?

As I noted above:

All executables are built using MXE project.

This project allows to easily prepare SDK with subset of libraries you need for building of applications.

Current versions of libraries:

libgcrypt | 1.8.4
libgpg_error | 1.35
libotr | 4.1.1

All libraries are built from sources taken from upstreams. Here are build rules for them:

You are missing the mpih-lshift-asm.S file.

Hmm, are you sure it is required for release (not debug) version of library?

It exists in the libgrcrypt library version you are using

I do not know how to check this in a binary file. (I completely forgot everything about ASM and low-level programming...)

Is been linked correctly: something like config.status: linking mpi/amd64/mpih-lshift.S to mpi/mpih-lshift-asm.S.

Ok, I'll look at build logs.

And, if you manage to install libgrcrypt and build it, can you paste the build output?

Yes, sure:

Also I see the additional patch for libgcrypt in MXE project: libgcrypt-1-fixes.patch.

Also I see the additional patch for libgcrypt in MXE project: libgcrypt-1-fixes.patch.

Wow! I have found the root of problem: if I drop ATTR_ABI related part of patch then segfault is not happen.

@claucece Thanks a lot for your help!

Ri0n commented

Hola Sofía.
I'm just curious. Is it you
https://mobile.twitter.com/i/web/status/985890984939671552 ?

I'm going to implement secure dtls based based file transfer for Psi. Do you want to join?

Wow! I have found the root of problem: if I drop ATTR_ABI related part of patch then segfault is not happen.

Nice @tehnick ! Everything working now?

Hola Sofía.
I'm just curious. Is it you
https://mobile.twitter.com/i/web/status/985890984939671552 ?

Yeah, that is me.

I'm going to implement secure dtls based based file transfer for Psi. Do you want to join
?

Interesting. I don't have much time; but if can collaborate on something, let me know. I'll be definetly interested in joining.

Ri0n commented

Interesting. I don't have much time; but if can collaborate on something, let me know. I'll be definetly interested in joining.

Yeah. Time is always a problem when we are talking about free projects :)
Btw, applying otv4 would nice too. If you could investigate a possibility of migration to the new protocol, that would be great. Probably we could propose required changed to XSF then.
Btw do you speak Spanish tambien?

Anyway join our xmpp conference at psi-dev@conference.jabber.ru

Nice @tehnick ! Everything working now?

Yes. I'll close this issue once my PR in MXE will be merged.

Thanks a lot to @claucece for this help about this "big" bug in libgcrypt from @mxe and not only in @psi-im / @psi-plus.

And @tehnick for all tests requested since a very long time (more one year), and for changes in the @mxe libgcrypt-1-fixes.patch.

@Ri0n I have already requested for OTRv4 in Psi (in more than OTRv3 of course) ah ah
For have not only the Pidgin client: https://github.com/otrv4/.

Yeah. Time is always a problem when we are talking about free projects :)
Btw, applying otv4 would nice too. If you could investigate a possibility of migration to the new protocol, that would be great. Probably we could propose required changed to XSF then.

Yes! That would we awesome! Once we are done with the design of OTRv4, we can start implementing :) I'll check what is needed.

Btw do you speak Spanish tambien?

Pero, claro! :)

Anyway join our xmpp conference at psi-dev@conference.jabber.ru

I will :)

Thanks a lot to @claucece for this help about this "big" bug in libgcrypt from @mxe and not only in @psi-im / @psi-plus.

Np :)

@Ri0n I have already requested for OTRv4 in Psi (in more than OTRv3 of course) ah ah
For have not only the Pidgin client: https://github.com/otrv4/.

Yes! :)