Question
Closed this issue · 2 comments
Why do
def endOfWeek(col: Column, lastDayOfWeek: String = "Sat"): Column = {
// dayOfWeek Case insensitive, and accepts: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"
// Saturday is the default week end day
when(dayOfWeekStr(dayofweek(col)) === lit(lastDayOfWeek), col)
.otherwise(next_day(col, lastDayOfWeek))
}
When you cold just do
def endOfWeek(col: Column, lastDayOfWeek: String = "Sat"): Column = {
next_day(col, lastDayOfWeek)
}
As far as I can tell they do the same thing, is there a benefit of the current approach that I'm missing?
I only ask because I just stumbled upon beginningOfWeek func which uses it and it could be simplified to:
def beginningOfWeek(col: Column, lastDayOfWeek: String = "Sat"): Column = {
DariaValidator.validateIsIn[String](lastDayOfWeek, List("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"))
date_sub(next_day(col, lastDayOfWeek), 6)
}
I think the original implementation was used because date_sub
didn't take column arguments in Spark 2. I am open to the change now that spark-daria doesn't support Spark 2 anymore. I am a bit hesitant to make the change just because I don't work with Spark anymore and don't want to make any refactorings that fall into the "nice to have" bucket. More interested in refactorings for performance gains or bug fixes. Thanks for opening the issue.
Sad to hear the spark news. I've enjoyed your content on it over the last couple of years. What have you moved to out of curiosity?
Gotcha, here's another implementation that does the same thing. I'm sure it'll be a negligible amount more efficient than the current one. I'll leave it here if you want to use it but if not feel free to close this.
def beginningOfWeek(col: Column, lastDayOfWeek: String = "Sat"): Column = {
DariaValidator.validateIsIn[String](lastDayOfWeek, List("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"))
next_day(col, lastDayOfWeek) - expr(s"interval 6 days")
}