spandex-project/spandex

Docs for Spandex.Ecto.Trace disappeared

vovayartsev opened this issue · 14 comments

Hey!

The documented way of tracing Ecto was lost somewhere between v1.3 and v1.6... However, googling for Spandex Ecto still takes you to that outdated page. Would it be possible to describe or link to the proper way of tracing Ecto right in the main README?

Thanks for doing your great job!

I actually ended up removing the ecto tracer from spandex, as it felt out of the purview of this library. The ability to construct traces still exists, but is slightly different than the old source. I'll put together a gist (and put a link to it in the readme). I won't be able to do anything about the old dead documentation that appears on google though :(

I'm currently traveling, so I'm not in a position to test this out, but I did my best to adapt it to the current version. Mind giving it a shot?

defmodule MyApp.Ecto.Trace do
  @moduledoc """

  A trace builder that can be given to ecto as a logger. It will try to get
  the trace_id and span_id from the caller pid in the case that the particular
  query is being run asynchronously (as in the case of parallel preloads).

  Traces will default to the service name `:ecto` but can be configured:

  config :spandex, :ecto,
    service: :my_ecto

  To configure, set it up as an ecto logger like so:

  config :my_app, MyApp.Repo,
    loggers: [{Ecto.LogEntry, :log, [:info]}, {MyApp.Ecto.Trace, :trace, []}]

  """
  @default_service_name :ecto

  defmodule Error do
    defexception [:message]
  end

  def trace(log_entry) do
    # Put in your own configuration here
    otp_app = :otp_app
    tracer = MyApp.Tracer
    service = :ecto
    database = "some_database"

    unless Application.get_env(otp_app, tracer) do
      now = Spandex.Adapters.Utils.now()
      _ = setup(log_entry, tracer)
      query = string_query(log_entry)
      num_rows = num_rows(log_entry)

      queue_time = get_time(log_entry, :queue_time)
      query_time = get_time(log_entry, :query_time)
      decoding_time = get_time(log_entry, :decode_time)

      start = now - (queue_time + query_time + decoding_time)

      tracer.update_span(
        start: start,
        completion_time: now,
        service: service,
        resource: query,
        type: :db,
        sql_query: [
          query: query,
          rows: inspect(num_rows),
          db: database
        ]
      )

      _ = report_error(log_entry)

      if queue_time != 0 do
        _ = tracer.start_span("queue")
        _ = tracer.update_span(start: start, completion_time: start + queue_time)
        _ = tracer.finish_span()
      end

      if query_time != 0 do
        _ = tracer.start_span("run_query")

        _ =
          tracer.update_span(
            start: start + queue_time,
            completion_time: start + queue_time + query_time
          )

        _ = tracer.finish_span()
      end

      if decoding_time != 0 do
        _ = tracer.start_span("decode")
        _ = tracer.update_span(start: start + queue_time + query_time, completion_time: now)
        _ = tracer.finish_span()
      end

      finish_ecto_trace(log_entry, tracer)
    end

    log_entry
  end

  defp finish_ecto_trace(%{caller_pid: caller_pid}, tracer) do
    if caller_pid != self() do
      tracer.finish_trace()
    else
      tracer.finish_span()
    end
  end

  defp finish_ecto_trace(_), do: :ok

  defp setup(%{caller_pid: caller_pid}, tracer) when is_pid(caller_pid) do
    if caller_pid == self() do
      Logger.metadata(trace_id: tracer.current_trace_id(), span_id: tracer.current_span_id())

      tracer.start_span("query")
    else
      trace = Process.info(caller_pid)[:dictionary][:spandex_trace]

      if trace do
        trace_id = trace.id

        span_id =
          trace
          |> Map.get(:stack)
          |> Enum.at(0, %{})
          |> Map.get(:id)

        Logger.metadata(trace_id: trace_id, span_id: span_id)

        tracer.continue_trace("query", trace_id, span_id)
      else
        tracer.start_trace("query")
      end
    end
  end

  defp setup(_) do
    :ok
  end

  defp report_error(%{result: {:ok, _}}), do: :ok

  defp report_error(%{result: {:error, error}}) do
    tracer.span_error(%Error{message: inspect(error)}, nil)
  end

  defp string_query(%{query: query}) when is_function(query),
    do: Macro.unescape_string(query.() || "")

  defp string_query(%{query: query}) when is_bitstring(query), do: Macro.unescape_string(query)
  defp string_query(_), do: ""

  defp num_rows(%{result: {:ok, %{num_rows: num_rows}}}), do: num_rows
  defp num_rows(_), do: 0

  def get_time(log_entry, key) do
    value = Map.get(log_entry, key)

    if is_integer(value) do
      to_nanoseconds(value)
    else
      0
    end
  end

  defp to_nanoseconds(time), do: System.convert_time_unit(time, :native, :nanoseconds)

  defp config do
    :spandex
    |> Confex.get_env(:ecto, [])
    |> Keyword.put_new(:service, @default_service_name)
  end
end

There is a bug there. It should read unless Application.get_env(otp_app, tracer)[:disabled?] do

@zachdaniel Sorry for along response loop, I've been on vacations.

Generally, the snipped worked, after fixing two errors:

  1. function Spandex.Adapters.Utils.now/0 is undefined
  2. tracer variable is undefined within report_error() function

There's one confusion though: somehow I see my OTP app's spans within an Ecto trace.
Maybe I need to create a separate Tracer module for Ecto stuff?

Alt text

Thank you 👍

@vovayartsev Could you share your version? In my case the call to tracer.start_span("query") returns {:error, :no_trace_context}

@vovayartsev I think you want to override the top level span's service. Hopefully your version helps @christopheonce also! Sorry I've been so busy lately I haven't had a chance to work on this library.

@christopheonce @zachdaniel Below is my version which works fine for me now, and my prior question about "Why my OTP app in Ecto logs" is fully resolved:

defmodule Core.Tracer do
  @moduledoc """
  DataDog tracer
  """
  use Spandex.Tracer, otp_app: :my_app
end
defmodule Core.Ecto.Trace do
  @moduledoc """
  A trace builder that can be given to ecto as a logger. It will try to get
  the trace_id and span_id from the caller pid in the case that the particular
  query is being run asynchronously (as in the case of parallel preloads).
  Traces will default to the service name `:ecto` but can be configured:
  config :spandex, :ecto,
    service: :my_ecto
  To configure, set it up as an ecto logger like so:
  config :my_app, MyApp.Repo,
    loggers: [{Ecto.LogEntry, :log, [:info]}, {MyApp.Ecto.Trace, :trace, []}]
  """
  @default_service_name :ecto

  defmodule Error do
    defexception [:message]
  end

  def trace(log_entry, database) do
    # Put in your own configuration here
    otp_app = :my_app
    tracer = Core.Tracer
    service = @default_service_name

    unless Application.get_env(otp_app, tracer)[:disabled?] do
      now = :os.system_time(:nano_seconds)
      _ = setup(log_entry, tracer)
      query = string_query(log_entry)
      num_rows = num_rows(log_entry)

      queue_time = get_time(log_entry, :queue_time)
      query_time = get_time(log_entry, :query_time)
      decoding_time = get_time(log_entry, :decode_time)

      start = now - (queue_time + query_time + decoding_time)

      tracer.update_span(
        start: start,
        completion_time: now,
        service: service,
        resource: query,
        type: :db,
        sql_query: [
          query: query,
          rows: inspect(num_rows),
          db: database
        ]
      )

      _ = report_error(tracer, log_entry)

      if queue_time != 0 do
        _ = tracer.start_span("queue")

        _ =
          tracer.update_span(service: service, start: start, completion_time: start + queue_time)

        _ = tracer.finish_span()
      end

      if query_time != 0 do
        _ = tracer.start_span("run_query")

        _ =
          tracer.update_span(
            service: service,
            start: start + queue_time,
            completion_time: start + queue_time + query_time
          )

        _ = tracer.finish_span()
      end

      if decoding_time != 0 do
        _ = tracer.start_span("decode")

        _ =
          tracer.update_span(
            service: service,
            start: start + queue_time + query_time,
            completion_time: now
          )

        _ = tracer.finish_span()
      end

      finish_ecto_trace(log_entry, tracer)
    end

    log_entry
  end

  defp finish_ecto_trace(%{caller_pid: caller_pid}, tracer) do
    if caller_pid != self() do
      tracer.finish_trace()
    else
      tracer.finish_span()
    end
  end

  defp finish_ecto_trace(_, _), do: :ok

  defp setup(%{caller_pid: caller_pid}, tracer) when is_pid(caller_pid) do
    if caller_pid == self() do
      Logger.metadata(trace_id: tracer.current_trace_id(), span_id: tracer.current_span_id())

      tracer.start_span("query")
    else
      trace = Process.info(caller_pid)[:dictionary][:spandex_trace]

      if trace do
        trace_id = trace.id

        span_id =
          trace
          |> Map.get(:stack)
          |> Enum.at(0, %{})
          |> Map.get(:id)

        Logger.metadata(trace_id: trace_id, span_id: span_id)

        tracer.continue_trace("query", trace_id, span_id)
      else
        tracer.start_trace("query")
      end
    end
  end

  defp setup(_, _) do
    :ok
  end

  defp report_error(_tracer, %{result: {:ok, _}}), do: :ok

  defp report_error(tracer, %{result: {:error, error}}) do
    tracer.span_error(%Error{message: inspect(error)}, nil)
  end

  defp string_query(%{query: query}) when is_function(query),
    do: Macro.unescape_string(query.() || "")

  defp string_query(%{query: query}) when is_bitstring(query), do: Macro.unescape_string(query)
  defp string_query(_), do: ""

  defp num_rows(%{result: {:ok, %{num_rows: num_rows}}}), do: num_rows
  defp num_rows(_), do: 0

  def get_time(log_entry, key) do
    value = Map.get(log_entry, key)

    if is_integer(value) do
      to_nanoseconds(value)
    else
      0
    end
  end

  defp to_nanoseconds(time), do: System.convert_time_unit(time, :native, :nanoseconds)
end

and in prod.exs:

config :my_app, MyApp.MyRepo,
   ....
   loggers: [
      {Ecto.LogEntry, :log, [:debug]},
      {Core.Ecto.Trace, :trace, ["MyRepo"]}
  ]

@christopheonce please be aware of this issue when changing loggers: https://stackoverflow.com/questions/48676316/ecto-repo-loggers-config-ignored

@vovayartsev thank you for sharing the code that works for you.
@zachdaniel I understand you have limited time to work on this. I finally understand why it does not work for me. All my interactions with the database go through a genserver. While debugging I figured out the trace is kept in the process spawned by Ranch. As the process logging the time spent in the database is the genserver. That's why it tells me no trace was started. I will work around this so it behaves as if it was a remote server.

Awesome! I think what we should have is a spandex_ecto library, that leverages this code and keeps it out of core.

I've added a note about needing this feature in a separate lib to the README, and added a link to this issue so others can use that code.

@zachdaniel the code above has a problem: the completion time gets overridden and all traces ends at the same time even though the Ecto log says otherwise. I traced the issue to the call to finish_span() which overrides the completion time. https://github.com/zachdaniel/spandex/blob/master/lib/spandex.ex#L160

What do you advise to avoid this?

Interesting. finish_span should definitely not override the completion time like that, so I think the solution is most likely just to fix that bug. I think I'd really like to see this moved into a spandex_ecto repository, so we can track issues separately there. I'll look into the bug shortly.

The bug that was brought up here recently has been resolved and is available in version 2.0.0

@christopheonce and @vovayartsev - I haven't had a chance to test this, but if you'd like to give it a shot: https://github.com/spandex-project/spandex_ecto