as.telemetry/as.POSIXct time format options
Closed this issue · 6 comments
Timestamp formats are a major issue for users. On import, as.telemetry has a timeformat argument that can help remove this import barrier for a lot of users. Would it be possible to have an optional text box or dropdown menu on the data import page for selecting timestamp formats?
I'm definitely not against this, but this should not be an issue for users that have gone through Movebank.
In addition to the string format, there's also the issue of the time zone. UTC
is default but some devices record in a local time zone.
Also, there is a R package for detecting timestamp formats, parsedate
. I haven't played around with it, but it might be possible to pass the timestamps through parsedate
as another failsafe or possibility. I'm a little hesitant on how to do this, as I would rather the import to fail with an error or throw a warning about the times being out of order than for everything to seem like its working right when it isn't.
I agree with @chfleming . There are two approaches:
- try to guess automatically and totally avoid user intervention unless it fails.
- specify the format string. This means we need to prepare some often used format and compile a list, then user can select the right format for the data. I'm not a huge fan of this method, as the common used format can be guessed easily, but we cannot predict any specific or user invented format.
lubridate::ymd_hms
can guess many commonly used format already as long as it's in ymd hms order. I think we can just try ymd_hms
, dmy
, mdy
in order and pick the one without error.
And there is also the timezone problem. Date time can be very tricky, it's hard to have a good solution to deal with all the corner cases.
> x <- c(20100101120101, "2009-01-02 12-01-02", "2009.01.03 12:01:03",
+ "2009-1-4 12-1-4",
+ "2009-1, 5 12:1, 5",
+ "200901-08 1201-08",
+ "2009 arbitrary 1 non-decimal 6 chars 12 in between 1 !!! 6",
+ "OR collapsed formats: 20090107 120107 (as long as prefixed with zeros)",
+ "Automatic wday, Thu, detection, 10-01-10 10:01:10 and p format: AM",
+ "Created on 10-01-11 at 10:01:11 PM")
> ymd_hms(x)
[1] "2010-01-01 12:01:01 UTC" "2009-01-02 12:01:02 UTC" "2009-01-03 12:01:03 UTC" "2009-01-04 12:01:04 UTC"
[5] "2009-01-05 12:01:05 UTC" "2009-01-08 12:01:08 UTC" "2009-01-06 12:01:06 UTC" "2009-01-07 12:01:07 UTC"
[9] "2010-01-10 10:01:10 UTC" "2010-01-11 22:01:11 UTC"
I opened the issue because I was just debugging an issue for a user. The date and times looked good, and were in format "%d-%m-%y %H:%M", but this was imported incorrectly and was listing the animals as having been tracked for 30 years. In command line, this is an easy fix, you just specify the format, but with the app, there's no solution.
I agree that automatic guessing would be difficult to implement, and the default options work well for most cases. I think text boxes that feed strings to the time format and time zone arguments would provide a path forward for this issue, with very little work on our end.
So you means let user to input the date time format string? It's not hard to have some text input in app and take this as parameter. However I doubt that the users having this kind of problem can know what to input here.
If the users have the knowledge and skills to get this, they will be able to convert it to a ISO standard format too.
The web app will never be as flexible as command line and I fear adding knobs and dials to make it work more like command line is not a right direction. If this can be done either automatically, or just let user to choose some common format examples in a list, I think it's still good. Otherwise just a textbox to input some mysterious format string will not really solve many problems -- users will need help to get that format string, and if somebody can help with that, they can also just convert the timestamp.
Yeah, that's fair. I know we're not going to get the full flexibility of command line, but data import is a pretty important step. Users without command line skills don't have too many options for cleaning up their timestamps because excel can ruin their data. So anything that adds a little bit more flexibility to the timestamp formatting would be a valuable addition for users.
The importing is done by ctmm::as.telemetry. To make the app support more formats, we need to
- make as.telemetry support customized format, this is not hard
- first attempt with default format, then allow user to input a customized format, import again.
Thinking again, I think if user knew the time format, they are halfway to fix the format already. They better fix the data itself and reduce all the possible problems when using data with all other packages, or import into movebank and get standardized format.
If the time is imported wrong, it should be obvious to see the problem in the summary table.