Attention: The code in this repository is intended for experimental use only and is not fully tested, documented, or supported by SingleStore. Visit the SingleStore Forums to ask questions about this repository.
The SingleStoreDB Go Arrow driver facilitates the reading of data in Apache Arrow format from SingleStoreDB databases. Note that this is the alpha release of the driver, and there may be changes to the API, type conversion, and other internal implementations in the future.
Run the following command to add the SingleStoreDB Go Arrow driver as a dependency to your Go module:
go get github.com/singlestore-labs/singlestoredb-go-arrow-driver
MySQL driver dependency is required to use this driver:
go get github.com/go-sql-driver/mysql@v1.7.2-0.20230809113539-7cf548287682
Use the following code to import dependencies:
import (
"database/sql"
_ "github.com/go-sql-driver/mysql"
s2db_arrow_driver "github.com/singlestore-labs/singlestoredb-go-arrow-driver"
)
The S2DBArrowReader
interface provides an API for reading Apache Arrow data from SingleStoreDB databases. To create a new instance of S2DBArrowReader
, use the NewS2DBArrowReader
function. S2DBArrowReader
provides the following methods:
GetNextArrowRecordBatch
: Retrieves a singleRecord
object (arrow.Record
) from the database. When there are no more records to fetch, it returnsnil
as the first part of the result tuple. You must release the returnedRecord
using theRelease()
method after use.Close
: Finalizes the reading of query results and releases all the acquired resources.
The NewS2DBArrowReader
function takes S2DBArrowReaderConfig
as a parameter. Here are the supported reader configurations and their explanations:
Name | Default | Description |
---|---|---|
Conn | No default (required) | The sql.DB object used to connect with a SingleStoreDB database. |
Args | nil (no arguments) | Arguments for placeholder parameters in the query. |
RecordSize | 10000 | The maximum number of rows in the resulting records. |
ParallelReadConfig | nil (sequential read) | Additional configurations for parallel read. If this value is non-nil , parallel read is enabled. |
UseClientConvesion | false | Indicates if the data should be converted to Arrow Record format on the client. It can be set to true for test purposes, performance optimizations take place only when it is false |
EnableQueryLogging | false | Controls whether the driver should generate debug logs. Debug logs are printed to the standard output. |
The S2DBParallelReadConfig
allows you to configure additional settings for parallel read. Here are the additional configurations that can be set:
Name | Default | Description |
---|---|---|
DatabaseName | No default (required) | The name of the SingleStoreDB database. It is used to determine the number of partitions for parallel reading. |
ChannelSize | 10000 | The size of the channel buffer. The channel stores references to Arrow Records while reading is in progress and transfers them to the main goroutine . |
EnableDebugProfiling | false | Controls whether to profile the query. Profiling result is printed to the standart output. |
Note: Set
interpolateParams=true
parameter of thesql.DB
in order to use parallel read. If this parameter is not set - you will get the following error:This command is not supported in the prepared statement protocol yet
Note: Currently parallel read with
UseClientConvesion = false
is not supported. So if you setParallelReadConfig
, you must also setUseClientConvesion
totrue
.
dsn := fmt.Sprintf("%s:%s@tcp(%s:%d)/%s?interpolateParams=true", "user", "password", "host", 3306, "database")
dsn += fmt.Sprintf("&connectionAttributes=%s:%s,%s:%s", "program_name", "CompanyName_AppName", "program_version", "1.2.3")
db, err := sql.Open("mysql", dsn)
if err != nil {
// Handle the error
}
arrowReader, err := s2db_arrow_driver.NewS2DBArrowReader(
context.Background(),
s2db_arrow_driver.S2DBArrowReaderConfig{
Conn: db,
Query: "SELECT * FROM t WHERE a > ? AND a < ?",
Args: []interface{}{1, 10},
// uncomment lines below to use parallel read instead of Arrow conversion on Server
// ParallelReadConfig: &s2db_arrow_driver.S2DBParallelReadConfig{
// DatabaseName: "db",
// },
// UseClientConvesion: true,
})
if err != nil {
// Handle the error
}
defer arrowReader.Close()
for batch, err := arrowReader.GetNextArrowRecordBatch(); batch != nil; batch, err = arrowReader.GetNextArrowRecordBatch() {
if err != nil {
// Handle the error
}
defer batch.Release()
// Process the batch
}
To achieve maximum performance, consider using parallel read. The performance of parallel read depends on the size of the SingleStore cluster and the number of CPU cores on the machine where the code runs. SingleStore recommends using a machine where the number of CPU cores is equal to the number of partitions in the SingleStoreDB database. While the above holds true, parallel read will be supported in the next release.
Additionally, performance is influenced by the data types in the SingleStoreDB database. Performance tests conducted by the SingleStore team demonstrated that nullable data types are slower than non-nullable types. Therefore, consider using non-nullable data types when appropriate.
The following table maps the SingleStoreDB data types to the corresponding Arrow data types. Note that this mapping is based on the alpha version of the driver and it may change in the future.
SingleStoreDB Data Type | Arrow Data Type |
---|---|
UNSIGNED TINYINT | uint8 |
UNSIGNED SMALLINT | uint16 |
UNSIGNED MEDIUMINT | uint32 |
UNSIGNED INT | uint32 |
UNSIGNED BIGINT | uint64 |
TINYINT | boolean |
SMALLINT | int16 |
MEDIUMINT | int32 |
INT | int32 |
BIGINT | int64 |
FLOAT | float32 |
DOUBLE | float64 |
DECIMAL | string |
YEAR | int16 |
DATE | string |
TIME | string |
DATETIME | string |
TIMESTAMP | string |
CHAR | string |
VARCHAR | string |
TINYTEXT | string |
TEXT | string |
MEDIUMTEXT | string |
LONGTEXT | string |
JSON | string |
BIT | binary |
BINARY | binary |
VARBINARY | binary |
TINYBLOB | binary |
BLOB | binary |
MEDIUMBLOB | binary |
LONGBLOB | binary |