aws/aws-sdk-cpp

transcribestreaming SIGSEGV of library in CRTHttpClient::MakeRequest -> ostream::write

sem32 opened this issue · 4 comments

sem32 commented

Describe the bug

We are using C++ SDK to transcribe stream in realtime, and we have an issue with crashing the SDK library in some cases, but it is 100% reproduced in case of the wrong env variable AWS_SECRET_ACCESS_KEY

Why we are using CRT HTTP CLIENT?
We are using it because we have a performance issue when we use lib CURL.

  • With the version of CURL 7.87 the quality of the transcribe was good, but CPU usage was too high (every 3-5 sec spike of CPU usage to 100%). For one transcribing process is more or less OK, but for 30 is not).
  • With the version of CURL 7.88 we faced an issue with the quality of the transcribe (it looks like the CURL library does some optimization), but we had no performance issue.
  • We have no issue with the quality and performance with the CRT http client.

GDB output: gdb_dump.txt

I tried to use libsanitizer to catch the issue, and here is the result: libsanitizer_res.txt

logs.txt

Expected Behavior

There are no crashes in the library

Current Behavior

the library is crashing

Reproduction Steps

the issue is reproduced in some rare cases with no changes, but 100% reproduced in case we put some wrong symbol to the value of AWS_SECRET_ACCESS_KEY environment variable

Possible Solution

No response

Additional Information/Context

No response

AWS CPP SDK version used

1.11.184 (latest master)

Compiler and Version used

gcc (Debian 10.2.1-6) 10.2.1 20210110

Operating System and version

Debian 11

jmklix commented

I'm working on trying to reproduce the same error you are getting and I had a few questions:

I just want to make sure we are both trying to solve the same problem. This similar looking issue was caused by a

permission access error in my AWS credential

and I want to make sure we're not debugging an error added artificially by changing the AWS_SECRET_ACCESS_KEY

sem32 commented

Are the logs/sanitizer/dump from when you reproduce the error without any changes? (i.e. with the normal AWS_SECRET_ACCESS_KEY

I've changed only the default requestTimeoutMs, because it is too small in SDK.

diff --git a/src/aws-cpp-sdk-core/source/client/ClientConfiguration.cpp b/src/aws-cpp-sdk-core/source/client/ClientConfiguration.cpp
index 30e4fbabc0..ba73b788b1 100644
--- a/src/aws-cpp-sdk-core/source/client/ClientConfiguration.cpp
+++ b/src/aws-cpp-sdk-core/source/client/ClientConfiguration.cpp
@@ -122,7 +122,7 @@ void setLegacyClientConfigurationParameters(ClientConfiguration& clientConfig)
clientConfig.useFIPS = false;
clientConfig.maxConnections = 25;
clientConfig.httpRequestTimeoutMs = 0;
- clientConfig.requestTimeoutMs = 3000;
+ clientConfig.requestTimeoutMs = 30000;
clientConfig.connectTimeoutMs = 1000;
clientConfig.enableTcpKeepAlive = true;
clientConfig.tcpKeepAliveIntervalMs = 30000;

Are you getting CRC Mismatch in both error cases?

yes, I have the same error CRC Mismatch even if I have the correct AWS_SECRET_ACCESS_KEY. When it's one transcribing session it's okay, but when I start 10-20 transcribing sessions in some time (20-30 sec) I have the same error (CRC Mismatch ) and the crash.
So, changing AWS_SECRET_ACCESS_KEY is the simplest way to reproduce the issue, but it's not a production case. In production, I have the same error (and crash) with a small load.
here are the logs/dumps:
crash2.zip

Also with the load I've faced other crashes with a load ~30 transcribing sessions
crash3.txt

and one more:
crash4.txt
crash5.zip

Can you confirm you are using the unmodified sample found here: https://github.com/awsdocs/aws-doc-sdk-examples/tree/main/cpp/example_code/transcribe

yes, correct. I tried to reproduce the issue with the wrong AWS_SECRET_ACCESS_KEY and it looks like the crash the same.

have you tried and reproduced this on any other OS's?

no, we are using Debian 11

I'm developing multithread application for realtime transcribing VoIP's calls, so when I load my module, I call Aws::InitAPI(options) and for each SIP call that I need to transcribe I start a separate thread where I call
m_client = Aws::MakeUnique<TranscribeStreamingServiceClient>("TAG", config);
StartStreamTranscriptionRequest m_request;
set all callbacks and call
m_client->StartStreamTranscriptionAsync(_request, OnStreamReady, OnResponseCallback, nullptr);

When I have 5 calls to transcribe, it looks good with no issue, but when it's 10-20 I start to face an issues with crashes of SDK's library.

I compile SDK by:
cmake ../aws-sdk-cpp -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH=/usr/local/ -DCMAKE_INSTALL_PREFIX=/usr/local/ -DBUILD_ONLY="transcribestreaming" -DUSE_CRT_HTTP_CLIENT=1

sem32 commented

@SergeyRyabinin
The fix is working. Thank you!

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.