JuliaData/CSV.jl

CSV.write() with append=true allocating a lot of memory

Opened this issue · 0 comments

I'm trying to convert some JSON database files into a data.csv file with all data features.

Currently, I'm reading the data and pushing it (push!()) into a DataFrames object, after all the "pushes" I'm writing the DataFrame object into a CSV file. However, I'm studying the possibility of writing the data directly into the .csv file with csv.write() with the append=true .

In my tests, when using this option with csv.write() the number of allocations increases from 90.204 MiB to 24.205 GiB.

When I run the code with julia --track-allocation=user, it shows that the allocation comes from the csv.write(data, append=true) function call. Does the append load all file content into the RAM, being that the cause?

@time result with push!() using DataFrames, appending the data inside a for loop through the JSON file list:
1.716448 seconds (1.32 M allocations: 90.204 MiB, 0.84% gc time)

@time result with csv.write(), appending the data inside a for loop through the JSON file list
2.116444 seconds (1.53 M allocations: 24.205 GiB, 9.48% gc time)

@time result with Base.write(), appending the data inside a for loop through the JSON file list
1.976700 seconds (1.36 M allocations: 93.373 MiB, 0.67% gc time)

The objective of the change was to reduce the number of allocations, however, there was an increase and I don't understand why.

Code versioning:
DataFrames version: [a93c6f00] DataFrames v1.6.1
CSV version: [336ed68f] CSV v0.10.14
Julia version: Julia Version 1.10.4 Commit 48d4fd48430 (2024-06-04 10:41 UTC)