melpon/memoize

Freezes up when get_or_run process crashes

eteeselink opened this issue ยท 12 comments

Hi there! First off, thanks for a great library, I love the API design.

Now, when defmemo'd function crashes its process, stuff breaks pretty bad. eg in IEx:

defmodule Moo do
  use Memoize
  defmemo foo() do 
    exit(1) 
  end
end

Now, calling Moo.foo once crashes the calling process (which seems acceptable to me), but calling it a second time totally freezes things up:

iex(2)> Moo.foo
** (exit) 1
    iex:5: Moo.__foo_memoize/0
    (memoize 1.3.0) lib/memoize/cache.ex:107: Memoize.Cache.do_get_or_run/3
iex(2)> Moo.foo
# iex totally hangs now, need to ctrl+c to kill it and get back to the shell

We noticed this when calling a DB query inside a defmemo and the DB connection pool timed out, making the memoized function not work correctly for the parameters values it was passed when it crashed (but functioning fine for other values).

Fixed and released 1.3.1. Thank you for reporting!

Wow nice! Thanks again for memoize and all the best :-)

This fix made our Semaphore builds failing because is_exception appeared in Elixir 1.11 while the requirement in mix.exs says that 1.9 is enough. We're using 1.10.4.

I will try to upgrade, however I suggest bumping the version in mix.exs so others don't run into the same issue.

Fixed and released 1.3.2. Thanks!

This issue continues happening. If you exit with the main pid process and executes again, the memoize function enter into an infinite loop.

# here exits correctly from iex or from the calling function
Memoize.Cache.get_or_run(:key, fn -> :erlang.exit(self(), :normal) end)

# If you execute again, Memoize enter into infinite loop.
Memoize.Cache.get_or_run(:key, fn -> :erlang.exit(self(), :normal) end)

Fixed by #14 and released 1.3.3.

jheld commented

It's unclear to us at this time if this particular issue is still at play (get_or_run process crashes propagating the issue), but we have seen the symptom (freezing up) recently and are running 1.3.3.

Please give me the code to reproduce it if possible.

jheld commented

Are you aware of any patterns we can try to make that happen? We're unsure what exactly caused it.

I doesn't aware it. If the pattern exists, I would like to add it to the test.

jrogov commented

It seems I've recently encountered this exact bug when caching results of calling a third-party service (using HTTP client). It made the function unresponsive (hanging forever) until I restarted the VM.

I'm trying to reproduce it, but to no avail yet.

jrogov commented

So my experiments haven't led me anywhere, but I'll share the environment that this problem was encountered in.

Basically my theory is that erroneously set Task.shutdown(task, :brutal_kill) interrupted Memoize in some place it shouldn't, deadlocking any other call to memoized function with the same arguments, regardless if it was called by other processes at the time of killing (thus, waiting the killed process to complete) time or in the future (because it would still be thinking that there's a process that calculates it).

In the original situation Memoize was used on top of HTTPoison, so maybe there's some interplay going on.

Rough draft of a code that should reproduce the issue, but it doesn't:

Mix.install(
  # For client
  [:memoize, :httpoison] ++
  # For server
  [:plug_cowboy]
)

defmodule HttpServer do
  use Application

  @impl true
  def start(_type, _args) do
    children = [
      {Plug.Cowboy,
       scheme: :http,
       plug: __MODULE__.Router,
       options: [port: 4040]}
    ]

    opts = [strategy: :one_for_one, name: HttpServer]
    Supervisor.start_link(children, opts)
  end

  defmodule Router do
    use Plug.Router

    plug Plug.Logger
    plug :match
    plug :dispatch

    get "/delay/:delay" do
      response = "Responded after #{delay} ms"
      each_delay = div(String.to_integer(delay), byte_size(response))

      # Here we use chunking to simulate long/big reply
      response
      |> String.graphemes()
      |> Enum.reduce(send_chunked(conn, 200), fn
        body, conn ->
          Process.sleep(each_delay)
        {:ok, conn} = chunk(conn, body)
        conn
      end)
    end
  end
end

HttpServer.start(nil, nil)

defmodule HttpClient do
  use Memoize
  require Logger

  @server_host "localhost:4040"

  defmemo fetch(delay), expires_in: 60_000 do
    Logger.warn("Sending request")
    body = HTTPoison.get!("#{@server_host}/delay/#{delay}", [], recv_timeout: 1_000 + delay).body
    Logger.warn("Response received")
    body
  end
end

Memoize.invalidate(HttpClient)

delay = 3_000
parent = self()

task = Task.async(fn -> HttpClient.fetch(delay) end)
pid = spawn(fn -> send(parent, HttpClient.fetch(delay)) end)
Process.sleep(div(delay, 2))
Task.shutdown(task, :brutal_kill)

Process.sleep(delay)

# This shouldn't return any messages
flush()

# This should halt indefinitely
HttpClient.fetch(delay)