amundsen-io/amundsen

Bug Report: BigQuery columns `sort_order` attribute is incorrect

Opened this issue · 1 comments

Expected Behavior

When using the BigQueryMetadataExtractor to extract tables schema from BigQuery the values of the ColumnMetadata.sort_order attribute should reflect the ordinal position of the column in BigQuery, i.e. the ordinal_position of the column as reported by the <data_set_name>.INFORMATION_SCHEMA.COLUMN table. Such that the column with ordinal_position=1 should get sort_order=1, the column with ordinal_position=2 should get sort_order=2, the column with ordinal_position=3 should get sort_order=3, or more general the column at index i gets sort_order=i.

Current Behavior

While the order of columns seems to be correct, the values in ColumnMetadata.sort_order seem to be inaccurate and do not match the ordinal_position of the column as specified in the information schema table.

The ColumnMetadata.sort_order seems to be getting only odd numbers, such that the column with ordinal_position=1 gets sort_order=1, the column with ordinal_position=2 gets sort_order=3, the column with ordinal_position=3 gets sort_order=5, or generally a column with oridinal_position=i gets sort_order=(i*2 - 1).

Possible Solution

When calling the _iterate_over_cols method, the total_cols parameter should be populated with the real number of columns processed so far. For example, in this line pass total_cols as is instead of total_cols + 1.

And inside the _iterate_over_cols, when creating the ColumnMetadata instance the sort_order should be set to total_cols+1 and match the return value of the function.

Your Environment

  • Amunsen version used: amundesen-databuilder version 7.4.4
  • Data warehouse stores: BigQuery
  • Python: 3.11.3

Thanks for opening your first issue here!