xou/elixlsx

What kind of performance is to be expected?

zachdaniel opened this issue · 7 comments

Right now we have a 2.5mb spreadsheet being generated, and the call to write_to_memory/2 takes about 25 seconds. Is this more than normal, or should we be investigating a cause?

Regardless it would be a very high priority for us to make this faster.

I checked, and its not the call to :zip.create/3 causing the slowness its the call to Elixlsx.Writer.create_files/2

xou commented

Thanks for doing the benchmarking! I have not written elixlsx with performance in mind, but I agree that 25secs for ~2.5MB seems a bit slow.

Could you share any specifics on what makes the spreadsheet so large? Is it a lot of sheets, large sheets (and in which dimension)?

Could you share a code snippet that reproduces this behaviour?

Thanks!

Elixlsx currently relies on a lot of string interpolation and string concatenation. I believe that replacing this with iolists will increase performance significantly. I updated the make_sheet/2 to use iolists and it ran 16% faster (as measured with Benchee), even though all the other functions that feed into make_sheet/2 were unchanged. The speed should increase with each function that's changed. The update was simply to output a list of the various strings, instead of concatenating them.

This is a good blog post I found on the topic:
https://www.bignerdranch.com/blog/elixir-and-io-lists-part-1-building-output-efficiently/

@ryanhart2 any chances to get a PR with this improvement?

I am having an issue with a spreadsheet that is only 8MB total, but consumes a whooping 2GB RAM while writing it.

If the IO List is a solution, I would be more than happy to finance a freelancer to do so.

@zachdaniel @ryanhart2

I am investigating the cost (in # of reductions) for very simple spreadsheets with N rows.

The number of reductions/row seems to go up with the number of rows - which looks like quadratic to me. This makes large spreadsheets take much longer.

Is this relatable to the use of IO lists?

My benchmark is naive, so I welcome feedback.

defmodule Example do
  alias Elixlsx.Workbook
  alias Elixlsx.Sheet

  def generate_data(n) do
    for _ <- 1..n do
      %{
        user_id: 200,
        shares: 200,
        strike_price: 200
      }
    end
  end

  def run(n) do
    task =
      Task.async(fn ->
        data = generate_data(n)

        reductions_1 = :erlang.process_info(self)[:reductions]

        workbook =
          %Workbook{}
          |> append_data_sheet(data)

        _ = Elixlsx.write_to_memory(workbook, "foo.xlsx")

        reductions = :erlang.process_info(self)[:reductions] -reductions_1

        reductions
      end)

    reductions = Task.await(task)

    # reds / n
    IO.puts("Reductions per row: #{reductions / n}")
  end

  def append_data_sheet(workbook, data) do
    sheet = %Sheet{name: "Data"}

    sheet =
      Enum.reduce(Enum.with_index(data), sheet, fn {row, index}, sheet ->
        sheet =
          sheet
          |> Sheet.set_at(index + 1, 1, row.user_id)
          |> Sheet.set_at(index + 1, 2, row.shares)
          |> Sheet.set_at(index + 1, 3, row.strike_price)

        Enum.reduce(1..10, sheet, fn j, sheet ->
          Sheet.set_at(sheet, index + 1, 3 + j, 5)
        end)
      end)

    Workbook.append_sheet(workbook, sheet)
  end
end

Did anyone compared Elixlsx with Exceed? https://github.com/synchronal/exceed