oracle/dotnet-db-samples

Oracle.ManagedDataAccess.Core version 23.6.1 - ORA-50000 : connection storm

ksantoso opened this issue · 9 comments

As a background, I'm using .Net 8 and ODP version 23.6.1 to connect to Oracle 19c version 19.10.0.
I have set Minimum Pool of 50, Maximum Pool of 200 and default for Increment Pool Size.
I use the async version for all methods related to database operation such as OpenAsync, ExecuteNonQueryAsync, etc.

I didn't get this ORA-50000 issue when using ODP version 3.21.160 with the same connection pool parameters and similar transaction load.
On the nuget page for ODP version 23.6.1 it is stated that [Bug 37144163 - ORA-50000: CONNECTION REQUEST TIME OUT USING OPENASYNC UNDER HIGH LOAD] has been fixed.
Can you suggest some options to investigate this issue?
cc @alexkeh

@ksantoso, please turn on ODP.NET tracing and provide access to the trace file. The trace fill will provide details about the root cause.

In ODP.NET 21c, async had yet to be implemented—those async APIs used synchronous calls under the covers.

@alexkeh Which trace level should I set?

@alexkeh I've managed to get the trace files during ORA-50000. Where should I upload them to?
*edit: I've tried to read the trace files and search for "ORA-50000" but can't seem to find any ora exceptions

You can create an Oracle support service request and upload the trace securely. You can make the trace available on a public share for download. If the attachment isn't too big for email, email it to dotnet_us(at)oracle.com.

I've uploaded the trace file to Google Drive and added your email for shared access.

@ksantoso We reviewed the trace. Before the ORA-50000 connection timeout occurs, we see that OracleConnection.Open() is being called, not OpenAsync(). You can try one or more of the following:

  • Change the Open() to OpenAsync()
  • Increase the connection timeout level
  • Use the Oracle listener connection rate limiter to limit requests from overwhelming the listener during a connection storm.

If you would like our team to conduct a more in-depth profile, we'll need the test app/case itself to profile.

Thanks for the recommendation.
Closing this issue as I've managed to avoid the ORA-50000 by adding ConnectionTimeout parameter and adjusting pool values.

Furthermore, I observed during stress test that given the same parameter setup, sync methods generally yield better throughput and more stable than the async counterparts using version 23.6.1.
Is this as expected @alexkeh?

The question of stability is a matter of what resources you have at various points in time. If you try to request multiple connections in quick succession without the ability to scale, then you will see instability. If you have the potential to scale, then you will remain stable even with many requests in a short period of time.

You use async when there are resources to process more requests concurrently. That leads to better throughput.

If you have a bottleneck that forces these async requests to wait, then you can see slower performance than sync. A common example is a connection storm when the DB is inundated with connection requests all of a sudden. If the DB and networking architecture hasn't been sized to handle this load, you can see connection requests pile up with async but not with sync.