`ds.transform(name='cast',casts=cast_dict)` creates duplicate columns | BigQuery
Closed this issue ยท 6 comments
Hi,
The preview of ds.transform(name='cast', casts=cast_dict)
shows a dataset with both old and new columns (casted).
I gave a look at cast.sql and I see that it starts with a SELECT *
.
Suggestion : I believe the ds.transform(name='cast', casts=cast_dict)
should be able to cast the provided columns, while keeping the other ones.
Thank you ๐
Hi @amirbtb thanks for using the package. We've taken our relationship to the next level: from bugs to product experience! ๐
This is interesting feedback, I'd like to make sure I'm reading it correctly. Can you confirm that this is what you're proposing as ideal state:
Input columns:
DATETHATSASTRING
(string)
INTTHATSASTRING
(string)
Output columns (Current State):
DATETHATSASTRING
(string)
INTTHATSASTRING
(string)
DATETHATSASTRING_DATE
(date)
INTTHATSASTRING_INT
(int)
Output columns (Proposed State):
DATETHATSASTRING
(date)
INTTHATSASTRING
(int)
To summarize: currently the cast transform creates a second column cast to a new data type and returns both columns. A better experience would be to only return the single cast column named as the original column (and not return the original column).
If this is off the mark, please let me know. Thanks!
Hi @griffatrasgo,
Your understanding is correct, do you agree on the fact that it would be an improvement of the behavior of the cast transform ?
Thanks !
Absolutely agree. We've envisioning this feature as: allow users to return only the transformed column by setting a toggle on the .transform()
method. It sounds like this is validation that we should release this quickly. I can ping you here when we have it ready in the package. I'm on vacation next week, so probably ~2 week timeframe.
the PR to solve this issue has been merged in RasgoTransforms: rasgointelligence/RasgoTransforms#136
Once we release a new version of RasgoQL, this will be resolved.
@amirbtb the fix will be to add an overwrite_columns parameter to the function, like this:
ds_casted = ds.cast( casts={ 'DS_WEATHER_ICON':'INT', 'DS_DAILY_HIGH_TEMP':'STRING', 'DS_DAILY_LOW_TEMP':'INT' }, overwrite_columns=True )
I just tested this and it works great !
ds_casted = ds.transform(
name='cast',
casts=cast_dict,
overwrite_columns=True
)
Thanks a lot for the fix ๐๐ฝ