Can it replace the H2 database engine?

Question

Can it replace the H2 database engine?

Opened this issue 2 years ago · 15 comments

The functionality is very powerful. I have a question. If a csv file has a size of 1TB, what would the query efficiency be?
Does csvq want to read all 1TB of content into memory before subsequent processing?
Can it replace the H2 database engine?

Answer 1 · 2023-04-02T19:40:37.000Z

No, csvq reads all the data into memory at runtime, so trying to handle a 1TB file is nearly impossible.

Answer 2 · 2023-04-02T21:57:27.000Z

Is there any suggestion on how large a file size csvq is suitable for processing?
How can I solve this 1TB file scenario?

Answer 3 · 2023-04-03T00:05:32.000Z

The file size you can handle depends on the query you want to execute and your system.

Csvq is not a DBMS, but an SQL interpreter that executes queries against text files such as csv.
I don't know what you want to do with the file and how you want to handle it, so I can't say what method would be appropriate. In general, however, it is appropriate to use a DBMS that is designed to handle large amounts of data.

Answer 4 · 2023-04-03T09:09:14.000Z

I want to use this SQL interpreter to process large text files, but I am not sure how efficient it is to process large files. If it is very efficient, it can do many things in memory databases

Answer 5 · 2023-04-03T16:02:29.000Z

Have you tried DuckDB?

Answer 6 · 2023-04-08T12:57:13.000Z

I am not looking for a database, but for a solution that can process big data in memory. I feel that CVSQ is very powerful.

Answer 7 · 2023-04-08T18:03:46.000Z

I agree that csvq is a very capable tool for querying CSV files using SQL and I still use it where it makes sense. DuckDB can do many of the same things and can process as much data as will fit in memory without ever creating a single table. Like csvq, it can read this data directly from CSV files, but also from Parquet files, and write results to such files. You need not create a DuckDB database to query very large CSV files in memory!

Answer 8 · 2023-04-10T11:25:56.000Z

What types of databases does DuckDB belong to? I read the document and feel that it is similar to a sqlite database. What are the main scenarios in which DuckDB is used?

Answer 9 · 2023-04-10T16:26:20.000Z

DuckDB is similar to SQLite in that it is an embedded database, but unlike SQLite, DuckDB is a column-oriented database such that it stores data column-wise, rather than row-wise. Column stores like DuckDB are designed for analysis of large data sets while SQLite is designed more for transaction processing. Neither has a server component, so they both are designed for local data processing on a personal computer. Column-oriented databases use various column data compression methods to store data more efficiently, scan and retrieve only the columns that the query selects (good for querying wide tables), and generally execute aggregate queries on columns more quickly.

Answer 10 · 2023-04-11T02:32:41.000Z

Can duckDB process (query) big data like 1TB in memory, or are there any other solutions?
Does duckDB Client API support golang?

Answer 11 · 2023-04-12T00:39:52.000Z

Can duckDB process (query) big data like 1TB in memory, or are there any other solutions?

DuckDB streams the input and results of most (or many) query operations, so most (or many) DuckDB commands can query very large data sets, even 1 TB, on computers that have much lower memory capacity.

Does duckDB Client API support golang?

https://pkg.go.dev/github.com/benjajaja/go-duckdb

Answer 12 · 2023-04-12T04:43:49.000Z

ok, I will look into the use of duckdb and then adapt it to etl-engine products.
https://github.com/hw2499/etl-engine
What olap functions does duckbd support?

Answer 13 · 2023-04-12T05:59:37.000Z

@hw2499 , @derekmahar Maybe the end of this discussion should be moved to the DuckDB Discussion Forum ? IMHO, the csvq issue tracker is not the place for this.

Answer 14 · 2023-04-12T13:01:37.000Z

@kpym, I agree. I actually didn't realise that this discussion was about a csvq issue. I thought it was a csvq discussion topic. In any case, in the DuckDB discussion forums on GitHub or the DuckDB Discord server, the DuckDB developers and other users could better answer @hw2499's questions.

Answer 15 · 2023-04-12T13:11:16.000Z

Okay, thank you. @kpym @derekmahar