MrPowers/spark-daria

Idea: add transformation for converting column names to parquet compatible ones

Closed this issue · 3 comments

May eventually get to creating a pull request for this, but if not and someone else wants to. Here is a snippet of code that isn't well tested:

"Column That {Will} Break\t;".replaceAll("[,;{}()\n\t=]", "").replaceAll(" ", "_").toLowerCase()
def withParquetCompatibleColumnNames()(df: DataFrame): DataFrame = {    
    df.columns.foldLeft(df) { (tmpDF, col) =>
      val newName = col.replaceAll("[,;{}()\n\t=]", "").replaceAll(" ", "_").toLowerCase()
      tmpDF.withColumnRenamed(col, newName)
    }
}

Some more hacky stuff.

@eclosson - this looks great, opened a PR: #116

Will give the other maintainers a little bit to see if they have any feedback before merging. Will let you know when a new version is released.

This code is merged with master.

I am having trouble deploying this code (#119). Will try to figure out a new deploy process and get a new version released ASAP.