Error reading Bit / Boolean columns
Closed this issue · 6 comments
Hi,
I am testing this tool and it have been great for most of our tables extraction.
However, I am encountering errors while trying to read columns with Bit
data type.
odbc2parquet -vvv query --connection-string "Driver={ODBC Driver 17 for SQL Server};Server=<...>;database=<...>;UID=<...>;PWD=<...>;" Pessoas.parquet "SELECT HabilitadoRiscoSacado_pes FROM Pessoas"
2021-07-30T18:03:35-03:00 - DEBUG - ODBC Environment created.
2021-07-30T18:03:36-03:00 - WARN - State: 01000, Native error: 5701, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Contexto do banco de dados alterado para '<...>'.
2021-07-30T18:03:36-03:00 - WARN - State: 01000, Native error: 5703, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Definição do idioma alterada para Português (Brasil).
2021-07-30T18:03:36-03:00 - DEBUG - ODBC column description for column 1: ColumnDescription { name: [72, 97, 98, 105, 108, 105, 116, 97, 100, 111, 82, 105, 115, 99, 111, 83, 97, 99, 97, 100, 111, 95, 112, 101, 115], data_type: Bit, nullability: NoNulls }
2021-07-30T18:03:36-03:00 - DEBUG - ODBC buffer description for column 1: BufferDescription { nullable: true, kind: Bit }
2021-07-30T18:03:36-03:00 - INFO - Memory usage per row is 68 bytes. This excludes memory directly allocated by the ODBC driver.
2021-07-30T18:03:36-03:00 - INFO - Batch size set to 31580641
2021-07-30T18:03:37-03:00 - INFO - Fetched batch 1 with 83817 rows.
2021-07-30T18:03:37-03:00 - DEBUG - Writing column with index 0 and name 'HabilitadoRiscoSacado_pes'.
Error: EOF: unable to put boolean value
If I cast it as a Tinyint
it works. Do you have any idea of why is this happening?
Thanks,
René
Hello René,
thanks for taking the time to file a bug report. Based on the error message and the log I would say the bug happens then writing to parquet, rather than reading from the database. I've written a minimal test case, but sadly it failed to reproduce the issue.
I setup an empty table with a bit column, and inserted five values into it.
setup_empty_table(&conn, table_name, &["BIT"]).unwrap();
let insert = format!(
"INSERT INTO {}
(a)
VALUES
(0), (1), (NULL), (1), (0);",
table_name
);
Yet odbc2parquet
runs fine and produces a parquet file for which parquet read shows the following output.
let expected_values = "\
{a: false}\n\
{a: true}\n\
{a: null}\n\
{a: true}\n\
{a: false}\n\
";
I am happy to see that you use a Microsoft SQL Database. Same as I use for the test setup for this tool. Yet my minimal mock data fails to reproduce the issue. I wonder if we need all of your 83817 rows to reproduce the issue. What would happen if you limit the query to only return part of the data? (Syntax for MSSQL can be found is this stack overflow issue: https://stackoverflow.com/questions/13220743/implement-paging-skip-take-functionality-with-this-query).
Your help in figuring this out would be greatly appreciated.
Cheers, Markus
Hi Markus, thanks for the quick response.
It seems like it fails when it tries to fetch more than 4096 rows.
Fetching 4096 rows
odbc2parquet -vvv query --connection-string "Driver={ODBC Driver 17 for SQL Server};---" Pessoas.parquet "SELECT TOP(4096) HabilitadoRiscoSacado_pes FROM Pessoas"
2021-07-31T09:10:19-03:00 - DEBUG - ODBC Environment created.
2021-07-31T09:10:20-03:00 - WARN - State: 01000, Native error: 5701, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Contexto do banco de dados alterado para '---'.
2021-07-31T09:10:20-03:00 - WARN - State: 01000, Native error: 5703, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Definição do idioma alterada para Português (Brasil).
2021-07-31T09:10:20-03:00 - DEBUG - ODBC column description for column 1: ColumnDescription { name: [72, 97, 98, 105, 108, 105, 116, 97, 100, 111, 82, 105, 115, 99, 111, 83, 97, 99, 97, 100, 111, 95, 112, 101, 115], data_type: Bit, nullability: NoNulls }
2021-07-31T09:10:20-03:00 - DEBUG - ODBC buffer description for column 1: BufferDescription { nullable: true, kind: Bit }
2021-07-31T09:10:20-03:00 - INFO - Memory usage per row is 68 bytes. This excludes memory directly allocated by the ODBC driver.
2021-07-31T09:10:20-03:00 - INFO - Batch size set to 31580641
2021-07-31T09:10:20-03:00 - INFO - Fetched batch 1 with 4096 rows.
2021-07-31T09:10:20-03:00 - DEBUG - Writing column with index 0 and name 'HabilitadoRiscoSacado_pes'.
Fetching 4097 rows
odbc2parquet -vvv query --connection-string "Driver={ODBC Driver 17 for SQL Server};---" Pessoas.parquet "SELECT TOP(4097) HabilitadoRiscoSacado_pes FROM Pessoas"
2021-07-31T09:10:14-03:00 - DEBUG - ODBC Environment created.
2021-07-31T09:10:15-03:00 - WARN - State: 01000, Native error: 5701, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Contexto do banco de dados alterado para '---'.
2021-07-31T09:10:15-03:00 - WARN - State: 01000, Native error: 5703, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Definição do idioma alterada para Português (Brasil).
2021-07-31T09:10:15-03:00 - DEBUG - ODBC column description for column 1: ColumnDescription { name: [72, 97, 98, 105, 108, 105, 116, 97, 100, 111, 82, 105, 115, 99, 111, 83, 97, 99, 97, 100, 111, 95, 112, 101, 115], data_type: Bit, nullability: NoNulls }
2021-07-31T09:10:15-03:00 - DEBUG - ODBC buffer description for column 1: BufferDescription { nullable: true, kind: Bit }
2021-07-31T09:10:15-03:00 - INFO - Memory usage per row is 68 bytes. This excludes memory directly allocated by the ODBC driver.
2021-07-31T09:10:15-03:00 - INFO - Batch size set to 31580641
2021-07-31T09:10:15-03:00 - INFO - Fetched batch 1 with 4097 rows.
2021-07-31T09:10:15-03:00 - DEBUG - Writing column with index 0 and name 'HabilitadoRiscoSacado_pes'.
Error: EOF: unable to put boolean value
Fetching 4096 rows offsetting 4096
odbc2parquet -vvv query --connection-string "Driver={ODBC Driver 17 for SQL Server};---" Pessoas.parquet "SELECT HabilitadoRiscoSacado_pes FROM Pessoas ORDER BY HabilitadoRiscoSacado_pes OFFSET 4096 ROWS FETCH NEXT 4096 ROWS ONLY"
2021-07-31T09:12:36-03:00 - DEBUG - ODBC Environment created.
2021-07-31T09:12:37-03:00 - WARN - State: 01000, Native error: 5701, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Contexto do banco de dados alterado para '---'.
2021-07-31T09:12:37-03:00 - WARN - State: 01000, Native error: 5703, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Definição do idioma alterada para Português (Brasil).
2021-07-31T09:12:37-03:00 - DEBUG - ODBC column description for column 1: ColumnDescription { name: [72, 97, 98, 105, 108, 105, 116, 97, 100, 111, 82, 105, 115, 99, 111, 83, 97, 99, 97, 100, 111, 95, 112, 101, 115], data_type: Bit, nullability: NoNulls }
2021-07-31T09:12:37-03:00 - DEBUG - ODBC buffer description for column 1: BufferDescription { nullable: true, kind: Bit }
2021-07-31T09:12:37-03:00 - INFO - Memory usage per row is 68 bytes. This excludes memory directly allocated by the ODBC driver.
2021-07-31T09:12:37-03:00 - INFO - Batch size set to 31580641
2021-07-31T09:12:37-03:00 - INFO - Fetched batch 1 with 4096 rows.
2021-07-31T09:12:37-03:00 - DEBUG - Writing column with index 0 and name 'HabilitadoRiscoSacado_pes'.
Fetching 4097 rows offsetting 4096
odbc2parquet -vvv query --connection-string "Driver={ODBC Driver 17 for SQL Server};Server=172.20.16.4;---" Pessoas.parquet "SELECT HabilitadoRiscoSacado_pes FROM Pessoas ORDER BY HabilitadoRiscoSacado_pes OFFSET 4096 ROWS FETCH NEXT 4097 ROWS ONLY"
2021-07-31T09:12:40-03:00 - DEBUG - ODBC Environment created.
2021-07-31T09:12:41-03:00 - WARN - State: 01000, Native error: 5701, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Contexto do banco de dados alterado para '---'.
2021-07-31T09:12:41-03:00 - WARN - State: 01000, Native error: 5703, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Definição do idioma alterada para Português (Brasil).
2021-07-31T09:12:41-03:00 - DEBUG - ODBC column description for column 1: ColumnDescription { name: [72, 97, 98, 105, 108, 105, 116, 97, 100, 111, 82, 105, 115, 99, 111, 83, 97, 99, 97, 100, 111, 95, 112, 101, 115], data_type: Bit, nullability: NoNulls }
2021-07-31T09:12:41-03:00 - DEBUG - ODBC buffer description for column 1: BufferDescription { nullable: true, kind: Bit }
2021-07-31T09:12:41-03:00 - INFO - Memory usage per row is 68 bytes. This excludes memory directly allocated by the ODBC driver.
2021-07-31T09:12:41-03:00 - INFO - Batch size set to 31580641
2021-07-31T09:12:41-03:00 - INFO - Fetched batch 1 with 4097 rows.
2021-07-31T09:12:41-03:00 - DEBUG - Writing column with index 0 and name 'HabilitadoRiscoSacado_pes'.
Error: EOF: unable to put boolean value
Hello René,
now there is a smoking gun. Thanks for running the experiment. I'm currently very busy moving. I'll try to reproduce the issue during train rides. In the meantime you may try to query all the data but set the --batch-size-row
option to 4096. Could be a workaround.
Cheers, Markus
Have a nice reproducing test, now. Thanks, I'll keep you in the loop.
Cheers, Markus
Should work now in 0.6.8
. Tests pass fine. I misunderstood what the write batch size property in the parquet library meant.
Thanks Markus, working great now!