ivan-rivera/RedditExtractor

Please consider giving full date time

Closed this issue · 5 comments

Hello,

Please can you consider simplifying:

timestamp_to_date <- function(timestamp) timestamp |>
  as.POSIXct(origin="1970-01-01") |>
  as.Date() |>
  format("%Y-%m-%d")

To

timestamp_to_date <- function(timestamp) timestamp |>
  as.POSIXct(origin="1970-01-01")

So that we can have more fine-grained comparisons between comment dates (and times)? Or add an option to not simplify to dates?

Thank you.

Edit - I forgot to say, your comment at the end of issue #4, doesn't quite solve this issue because converting back to timestamp from a date, when the time has been dropped, doesn't give the original timestamp.

Here the "round trip" works

timestamp_to_date <- function(timestamp) timestamp |>
  as.POSIXct(origin = "1970-01-01") 

as.numeric(as.POSIXct(timestamp_to_date(1644944045), origin = "1970-01-01"))
[1] 1644944045

But here it doesn't

timestamp_to_date <- function(timestamp) timestamp |>
  as.POSIXct(origin="1970-01-01") |>
  as.Date() |>
  format("%Y-%m-%d")

as.numeric(as.POSIXct(timestamp_to_date(1644944045), origin = "1970-01-01"))
[1] 1644883200

Thanks for the suggestion! Yes, perhaps removing the date conversion may be a good idea. I'm afraid that I won't be able to do this for a little while because I'm travelling without my laptop, but feel free to create a PR yourself and I'll approve it. Otherwise, I'll introduce this change myself a little later.

Yeah no worries, will do. Enjoy your travels!

Just looking at this now and a couple of questions:

  1. the simplest edit is just to remove the simplification lines above (and edit the tests accordingly). But then we're creating a function to wrap just as.POSIXct(origin = "1970-01-01"), which seems like overkill. So I could just replace the timestamp_to_date function everywhere with as.POSIXct(origin = "1970-01-01"). That also means we can get rid of the tests so simplifies that a bit. But I'll defer to your preference on that.
  2. the returned data currently contains date_utc. Technically we're now returning a POSIXct so maaaaaybe we should rename the variable accordingly. But I fear (a) I'm being ludicrously pedantic and (b) that might break people's code. So I think probably we should leave that alone.

Let me know what you prefer and I'll finish it off.

Edit: balls, made some bad typos.

How about we simply keep the date fields as they are and add new fields with the timestamps?

Yeah, that's a much better idea!