spandex-project/spandex_datadog

Bug when reporting ecto errors

BlueHotDog opened this issue · 9 comments

Here's a trace:

lib/spandex_datadog/api_server.ex:271:erlang.apply({:badmatch, %Api.Storylines.Screen{__meta__: #Ecto.Schema.Metadata<...>, edits: #Ecto.Association.NotLoaded<association :edits is not loaded>, id: "123-123-123-123", inserted_at: ~U[2020-11-10 13:13:21.000000Z], last_edited: ~U[2020-11-10 13:13:21Z], name: "screen name", original_dimensions: nil, s3_object_name: "xxxxx/yyyyy", screenshot_image_uri: "some_url", storyline: #Ecto.Association.NotLoaded<...>, storyline_id: "123-123-123-123", updated_at: ~U[2020-12-01 07:35:27.000000Z], url: "www.example.com"}}, :__struct__, [])	
lib/spandex_datadog/api_server.ex:271SpandexDatadog.ApiServer.add_error_type/2	
lib/spandex_datadog/api_server.ex:264SpandexDatadog.ApiServer.add_error_data/2	
lib/spandex_datadog/api_server.ex:244SpandexDatadog.ApiServer.meta/1	
lib/spandex_datadog/api_server.ex:230SpandexDatadog.ApiServer.format/3	
lib/enum.ex:1399Enum."-map/2-lists^map/1-0-"/2	
lib/enum.ex:1399Enum."-map/2-lists^map/1-0-"/2	
lib/spandex_datadog/api_server.ex:196SpandexDatadog.ApiServer.send_and_log/2

We receieve this error every time an exception happens. seems like Spandex tries to encode the ecto entity and fails

Thanks for the bug report! I will try to reproduce and figure out what we can do about that.

If you're able to clarify any more details about how to reproduce the problem, that would be helpful. I'm guessing that you're using spandex_ecto, but what version of spandex_ecto and ecto_sql? Are you doing anything custom with the Ecto spans, or just connecting it to Ecto's telemetry per the docs? Can you tell how to trigger this error, or does it just happen for all traces in your app?

From what I can tell, Spandex is expecting an error type, but it's getting a Struct from your application (Api.Storylines.Screen) instead.

Hey, for sure, these are the versions:

{:ecto_sql, "~> 3.4"},
{:spandex, "~> 3.0.3"},
{:spandex_ecto, "~> 0.6.2"},
{:spandex_phoenix, "~> 1.0.5"},
{:spandex_datadog, "~> 1.0.0"}

Not doing anything custom, bare minimum implementation per the docs.
I'm not sure how to trigger this errors since the logs are lacking, but it seems like there's some changeset error which propagates all the way to the web layer where it crashes and Spandax tries to report the crash(with the changeset) and fails.

Great, thanks! I'll see if I can guess what's causing it and what we can do about it.

We just hit the same bug as well. It looks like some errors are not Exception structs, but rather just :atoms

UndefinedFunctionError: function :function_clause.__struct__/0 is undefined (module :function_clause is not available)
  Module "function_clause", in :function_clause.__struct__/0
  File "lib/spandex_datadog/api_server.ex", line 271, in SpandexDatadog.ApiServer.add_error_type/2
  File "lib/spandex_datadog/api_server.ex", line 264, in SpandexDatadog.ApiServer.add_error_data/2
  File "lib/spandex_datadog/api_server.ex", line 244, in SpandexDatadog.ApiServer.meta/1
  File "lib/spandex_datadog/api_server.ex", line 230, in SpandexDatadog.ApiServer.format/3
  File "lib/enum.ex", line 1399, in Enum."-map/2-lists^map/1-0-"/2
  File "lib/enum.ex", line 1399, in Enum."-map/2-lists^map/1-0-"/2
  File "lib/spandex_datadog/api_server.ex", line 196, in SpandexDatadog.ApiServer.send_and_log/2

Probably just need to add a clause to

@spec add_error_type(map, Exception.t() | nil) :: map
defp add_error_type(meta, nil), do: meta
defp add_error_type(meta, exception), do: Map.put(meta, "error.type", exception.__struct__)
to handle atom/tuple errors.

From what I can tell, Spandex is expecting an error type, but it's getting a Struct from your application (Api.Storylines.Screen) instead.

Instead of getting an Elixir struct as an Exception, it got a tuple:

{:badmatch, %Api.Storylines.Screen{__meta__: #Ecto.Schema.Metadata<...>, edits: #Ecto.Association.NotLoaded<association :edits is not loaded>, id: "123-123-123-123", inserted_at: ~U[2020-11-10 13:13:21.000000Z], last_edited: ~U[2020-11-10 13:13:21Z], name: "screen name", original_dimensions: nil, s3_object_name: "xxxxx/yyyyy", screenshot_image_uri: "some_url", storyline: #Ecto.Association.NotLoaded<...>, storyline_id: "123-123-123-123", updated_at: ~U[2020-12-01 07:35:27.000000Z], url: "www.example.com"}}

I am also seeing api_server crash when trying to report errors due to getting a :badarith erlang error:

function :badarith.__struct__/0 is undefined (module :badarith is not available)
unknown  ?  __struct__/0
unknown:?:in `__struct__/0'
api_server.ex  273  add_error_type/2
lib/spandex_datadog/api_server.ex:273:in `add_error_type/2'
api_server.ex  266  add_error_data/2
lib/spandex_datadog/api_server.ex:266:in `add_error_data/2'
api_server.ex  246  meta/1
lib/spandex_datadog/api_server.ex:246:in `meta/1'
api_server.ex  175  format/3
lib/spandex_datadog/api_server.ex:175:in `format/3'
enum.ex  1411  -map/2-lists^map/1-0-/2
lib/enum.ex:1411:in `-map/2-lists^map/1-0-/2'
api_server.ex  141  send_and_log/2
lib/spandex_datadog/api_server.ex:141:in `send_and_log/2'
api_server.ex  214  -maybe_flush_traces/1-fun-7-/2
lib/spandex_datadog/api_server.ex:214:in `-maybe_flush_traces/1-fun-7-/2'

Any luck with this or how to handle it? I'm seeing this as well.