Idea: add transformation for converting column names to parquet compatible ones
Closed this issue · 3 comments
eclosson commented
May eventually get to creating a pull request for this, but if not and someone else wants to. Here is a snippet of code that isn't well tested:
"Column That {Will} Break\t;".replaceAll("[,;{}()\n\t=]", "").replaceAll(" ", "_").toLowerCase()
eclosson commented
def withParquetCompatibleColumnNames()(df: DataFrame): DataFrame = {
df.columns.foldLeft(df) { (tmpDF, col) =>
val newName = col.replaceAll("[,;{}()\n\t=]", "").replaceAll(" ", "_").toLowerCase()
tmpDF.withColumnRenamed(col, newName)
}
}
Some more hacky stuff.
MrPowers commented