mrpowers-io/spark-daria

Question

Closed this issue · 2 comments

https://github.com/MrPowers/spark-daria/blob/4d24f44a282f98c82f3b34ecf425b84689f958f7/src/main/scala/com/github/mrpowers/spark/daria/sql/functions.scala#L289

Why do

def endOfWeek(col: Column, lastDayOfWeek: String = "Sat"): Column = {
    // dayOfWeek Case insensitive, and accepts: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"
    // Saturday is the default week end day
    when(dayOfWeekStr(dayofweek(col)) === lit(lastDayOfWeek), col)
      .otherwise(next_day(col, lastDayOfWeek))
  }

When you cold just do

def endOfWeek(col: Column, lastDayOfWeek: String = "Sat"): Column = {
    next_day(col, lastDayOfWeek)
  }

As far as I can tell they do the same thing, is there a benefit of the current approach that I'm missing?

I only ask because I just stumbled upon beginningOfWeek func which uses it and it could be simplified to:

 def beginningOfWeek(col: Column, lastDayOfWeek: String = "Sat"): Column = {
    DariaValidator.validateIsIn[String](lastDayOfWeek, List("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"))
    date_sub(next_day(col, lastDayOfWeek), 6)
  }

I think the original implementation was used because date_sub didn't take column arguments in Spark 2. I am open to the change now that spark-daria doesn't support Spark 2 anymore. I am a bit hesitant to make the change just because I don't work with Spark anymore and don't want to make any refactorings that fall into the "nice to have" bucket. More interested in refactorings for performance gains or bug fixes. Thanks for opening the issue.

Sad to hear the spark news. I've enjoyed your content on it over the last couple of years. What have you moved to out of curiosity?

Gotcha, here's another implementation that does the same thing. I'm sure it'll be a negligible amount more efficient than the current one. I'll leave it here if you want to use it but if not feel free to close this.

def beginningOfWeek(col: Column, lastDayOfWeek: String = "Sat"): Column = {
    DariaValidator.validateIsIn[String](lastDayOfWeek, List("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"))
    next_day(col, lastDayOfWeek) - expr(s"interval 6 days")
  }