add new data by columns into duckdb out of memory
Closed this issue · 2 comments
I have incoming data that I want to store on disk in a database or something. The data looks something like this
incoming_data <- function(ncol=5){
dat <- sample(1:10,100,replace = T) |> matrix(ncol = ncol) |> as.data.frame()
random_names <- sapply(1:ncol(dat),\(x) paste0(sample(letters,1), sample(1:100,1)))
colnames(dat) <- random_names
dat
}
incoming_data()
This incoming_data is just for example.. In reality, one incoming_data set will have several 5k rows and about 50k columns. And the entire final file will be about 200-400 gigabytes
My question is how to add new data as columns to the database without loading the file into RAM
# your way
path <- "D:\\R_scripts\\new\\duckdb\\data\\DB.duckdb"
library(duckdb)
library(duckplyr)
con <- dbConnect(duckdb(), dbdir = path, read_only = FALSE)
# write one piece of data in DB
dbWriteTable(con, "my_dat", incoming_data())
#### how to make something like this ####
my_dat <- cbind("my_dat", incoming_data())
Thanks. This is a very broad question, and not a good fit for this issue tracker. Either way, 50k columns sounds like way too many. Any chance you can the data "longer"?
Thanks for your lightning fast response!
Yes I can keep the data "longer".
I understand that my question doesn't really fit the format and I apologize for that, but I would be very grateful for your help