should partition columns rearrange to the end of the table?
Closed this issue · 1 comments
I've read that it's more friendly/effective to place partition columns at the ends of tables. Here's how I manually re-arrange columns so I can partition on 'year':
# Put year at the end to make for more friendly partitioning
year_index = df.columns.get_loc('year')
df = df[df.columns[:year_index].to_list()
+ df.columns[year_index+1:].to_list()
+ [df.columns[year_index]]]
If that's really true and life is better with partition columns at the end, would it then be a good idea to create a convenience function that takes a data frame and an intended partitioning statement and returns a properly ordered data frame? Or a data frame's column list and an intended partitioning statement and returns a properly ordered list of columns (from which a properly ordered dataframe can be constructed)?
If, on the other hand, the idea is based on fairy tales, let's close the issue and state emphatically that column order and partitioning are unrelated.
Aha...just noticed this function in the osc_ingest_tools library: enforce_partition_column_order. I'll start using that.