Bug Report: BigQuery columns `sort_order` attribute is incorrect
Opened this issue · 1 comments
Expected Behavior
When using the BigQueryMetadataExtractor
to extract tables schema from BigQuery the values of the ColumnMetadata.sort_order
attribute should reflect the ordinal position of the column in BigQuery, i.e. the ordinal_position
of the column as reported by the <data_set_name>.INFORMATION_SCHEMA.COLUMN
table. Such that the column with ordinal_position=1
should get sort_order=1
, the column with ordinal_position=2
should get sort_order=2
, the column with ordinal_position=3
should get sort_order=3
, or more general the column at index i
gets sort_order=i
.
Current Behavior
While the order of columns seems to be correct, the values in ColumnMetadata.sort_order
seem to be inaccurate and do not match the ordinal_position
of the column as specified in the information schema table.
The ColumnMetadata.sort_order
seems to be getting only odd numbers, such that the column with ordinal_position=1
gets sort_order=1
, the column with ordinal_position=2
gets sort_order=3
, the column with ordinal_position=3
gets sort_order=5
, or generally a column with oridinal_position=i
gets sort_order=(i*2 - 1)
.
Possible Solution
When calling the _iterate_over_cols
method, the total_cols
parameter should be populated with the real number of columns processed so far. For example, in this line pass total_cols
as is instead of total_cols + 1
.
And inside the _iterate_over_cols
, when creating the ColumnMetadata
instance the sort_order
should be set to total_cols+1
and match the return value of the function.
Your Environment
- Amunsen version used:
amundesen-databuilder
version 7.4.4 - Data warehouse stores: BigQuery
- Python: 3.11.3
Thanks for opening your first issue here!