What kind of performance is to be expected?
zachdaniel opened this issue · 7 comments
Right now we have a 2.5mb spreadsheet being generated, and the call to write_to_memory/2
takes about 25 seconds. Is this more than normal, or should we be investigating a cause?
Regardless it would be a very high priority for us to make this faster.
I checked, and its not the call to :zip.create/3
causing the slowness its the call to Elixlsx.Writer.create_files/2
Thanks for doing the benchmarking! I have not written elixlsx with performance in mind, but I agree that 25secs for ~2.5MB seems a bit slow.
Could you share any specifics on what makes the spreadsheet so large? Is it a lot of sheets, large sheets (and in which dimension)?
Could you share a code snippet that reproduces this behaviour?
Thanks!
Elixlsx currently relies on a lot of string interpolation and string concatenation. I believe that replacing this with iolists will increase performance significantly. I updated the make_sheet/2
to use iolists and it ran 16% faster (as measured with Benchee), even though all the other functions that feed into make_sheet/2
were unchanged. The speed should increase with each function that's changed. The update was simply to output a list of the various strings, instead of concatenating them.
This is a good blog post I found on the topic:
https://www.bignerdranch.com/blog/elixir-and-io-lists-part-1-building-output-efficiently/
@ryanhart2 any chances to get a PR with this improvement?
I am having an issue with a spreadsheet that is only 8MB total, but consumes a whooping 2GB RAM while writing it.
If the IO List is a solution, I would be more than happy to finance a freelancer to do so.
I am investigating the cost (in # of reductions) for very simple spreadsheets with N rows.
The number of reductions/row seems to go up with the number of rows - which looks like quadratic to me. This makes large spreadsheets take much longer.
Is this relatable to the use of IO lists?
My benchmark is naive, so I welcome feedback.
defmodule Example do
alias Elixlsx.Workbook
alias Elixlsx.Sheet
def generate_data(n) do
for _ <- 1..n do
%{
user_id: 200,
shares: 200,
strike_price: 200
}
end
end
def run(n) do
task =
Task.async(fn ->
data = generate_data(n)
reductions_1 = :erlang.process_info(self)[:reductions]
workbook =
%Workbook{}
|> append_data_sheet(data)
_ = Elixlsx.write_to_memory(workbook, "foo.xlsx")
reductions = :erlang.process_info(self)[:reductions] -reductions_1
reductions
end)
reductions = Task.await(task)
# reds / n
IO.puts("Reductions per row: #{reductions / n}")
end
def append_data_sheet(workbook, data) do
sheet = %Sheet{name: "Data"}
sheet =
Enum.reduce(Enum.with_index(data), sheet, fn {row, index}, sheet ->
sheet =
sheet
|> Sheet.set_at(index + 1, 1, row.user_id)
|> Sheet.set_at(index + 1, 2, row.shares)
|> Sheet.set_at(index + 1, 3, row.strike_price)
Enum.reduce(1..10, sheet, fn j, sheet ->
Sheet.set_at(sheet, index + 1, 3 + j, 5)
end)
end)
Workbook.append_sheet(workbook, sheet)
end
end
Did anyone compared Elixlsx with Exceed? https://github.com/synchronal/exceed