Consider using seconds as the default precision

Question

Consider using seconds as the default precision

Closed this issue 5 years ago · 23 comments

InfluxDB currently defaults to nanosecond precision for writes and queries. Most other tools and languages use second precision. Given that

a common beginner mistake is using timestamps with second precision (resulting in a bunch of 1970 points that trip people up), and
significant compression gains can be realized when using seconds as the precision with the TSM engine,

Should we consider changing the default precision for timestamps to seconds?

For reference, the default precision is set here:
https://github.com/influxdata/influxdb/blob/master/models/points.go#L1241

Answer 1 · 2016-03-17T17:15:12.000Z

I'm a big fan of this, currently our approach has been to default to nanoseconds on the database, but recommend that people use seconds. I think it makes sense to default to the time precision that we recommend people use (and what is probably the more common timestamp precision that is necessary too).

Also +1 to most other tools using seconds by default, this would be a boon to new users.

Answer 2 · 2016-03-17T19:58:57.000Z

We could add some auto-guessing code for when no precision has been set anywhere. We're talking about three orders of magnitude between precision settings s → ms → us → ns. I think javascript defaults to milliseconds since epoch. If a ms timestamp gets interpreted as seconds, it'll be over 46000 years in the future, which can't even be represented with an int64 of "nanoseconds since epoch", which only has a span of ±292 years. Going the other direction (interpreting seconds as milliseconds) is still 46 years in the past.

We could simply pick the precision that is closest to the current time with an order of magnitude check.

We would of course still highly recommend setting a precision, but this change would remove some of the shock factor for new users

Answer 3 · 2016-03-17T20:57:12.000Z

@joelegasse An order of magnitude check sounds like a straightforward way to handle timestamps with a lack of user-supplied precision. We could even generate a log message if a batch contains a timestamp close to 1970 even at second precision.

Answer 4 · 2016-03-17T21:00:49.000Z

didn't think of that, sounds like a good idea 👍

Answer 5 · 2016-03-17T21:17:52.000Z

I'd much prefer the order of magnitude check than changing the default.

This is really a problem around the client libraries. Client libraries should make it clear which precision you're using thus they wouldn't have the problem.

Answer 6 · 2016-03-17T21:33:34.000Z

I think the order of magnitude check would mean there is no longer a "default" precision, right? If we go that route, I think we should also write a warning to the logs for each batch of points that doesn't specify a precision. Something that will tell them, "We're guessing what you meant, but you've probably done something bad, and you should feel bad..." 😛

Answer 7 · 2016-03-17T21:42:33.000Z

I'd be worried about the logs getting spammed with that message if it was printed on every write without a precision. We already generate a ton of logs.

Answer 8 · 2016-03-17T23:20:54.000Z

@joelegasse, @gunnaraasen yeah, logging that on every write would be way too loud

Answer 9 · 2016-03-19T00:48:28.000Z

Some of the awesome is lost if users don't experience nanosecond support initially, I think. It's very powerful to SHOW that the database is so modern and powerful that it handles nanoseconds with aplomb. If we default to seconds, that's a power-user feature that almost never gets noticed except by the people explicitly looking for it. Not really a strong argument, I know, but I do think it's important to consider the perceptual impact of this change.

Seconds precision is also the default for devops tools, but what are the defaults in the IoT world? What do historians typically use? In APM, milliseconds is the default. The default we pick shows an opinion as to the primary use case. Why not leave that at nanoseconds, which is allegiant to none and forward-looking?

Answer 10 · 2016-03-19T02:14:20.000Z

@beckettsean The order-of-magnitude check would replace the concept of a "default precision", and would instead pick the scale that would have the timestamp closest to the current time. Points without a timestamp would still be tagged with the nanosecond-precision time of when they were received by the server.

This check would mitigate some of the confusion/frustration that comes from just assuming an unlabeled timestamp is in "nanoseconds since epoch". It certainly would not be removing support for nanoseconds, but it would mean that users aren't wondering why their data was "lost", when it's really just stored as a couple minutes in to January 1970.

Answer 11 · 2016-03-23T22:07:54.000Z

@joelegasse I like the order of magnitude check, provided that query responses continue to provide nanosecond unless otherwise explicitly restricted.

Answer 12 · 2016-04-03T22:26:28.000Z

would be nice if line protocol supported a timestamp with unit [h,m,s,ms,us,ns].
https://docs.influxdata.com/influxdb/v0.11/write_protocols/write_syntax/#line-protocol
The default should likely stay as ns

disk_free,tag=t  value=1  timestamp[s,ms,us,ns]
# if a blank timestamp is given could still include the precision
disk_free,tag=t  value=1  [s,ms,us,ns]

I never knew I could get significant compression gains when using seconds as the precision wish that was included in the help page for the line protocol.

Answer 13 · 2016-04-04T17:01:31.000Z

@steverweber Specifying the unit on the timestamp will likely be a feature of the next iteration of the line protocol. See the discussion at #6037 for more details. I've added a comment about allowing precision per point without timestamps [s,ms,us,ns] since it hadn't been suggested before.

I've also opened influxdata/docs.influxdata.com-ARCHIVE#372 to get the improved compression benefits documented in more places.

Is there a reason you'd prefer the default to remain ns when no precision is provided, versus the order-of-magnitude check suggested above?

Answer 14 · 2016-04-04T17:31:24.000Z

@gunnaraasen Thanks for managing the suggestions.

Is there a reason you'd prefer the default to remain ns when no precision is provided, versus the order-of-magnitude check suggested above?

As a beginner I assumed the timestamp was in seconds, and failed. If/when #6037 is resolved I assume timestamps being used incorrectly in respect to the line protocol will be largely reduced.

Changing the default from ns seems to require some fun code changes to maintain compatibility. Is the added code complexity worth the gains.. I don't know,

Answer 15 · 2016-04-04T17:38:19.000Z

Also... what happens when someone really does want to use some strange times that is in a distant past.. Also the fun to read documentation with a paragraph describing this time check nuance.

Answer 16 · 2016-04-05T00:19:02.000Z

@steverweber thanks for the feedback!

As a beginner I assumed the timestamp was in seconds, and failed.

This is an initial pitfall which would be greatly improved by auto-setting a precision based on the order of magnitude of the timestamp. The order of magnitude check will only occur when no precision parameter is set.

The change would add some documentation and code complexity. However, we frequently have issues opened by new users who write seconds precision timestamps without specifying precision and are confused when their data shows up at Jan 1, 1970. Doing the right thing in the majority cases feels like it trumps sticking with an overly precise default which actively causes confusion among new users.

In terms of maintainability, only clients that don't already set a precision and write points within specific time ranges (1969-1971 and >2400) will need to be updated and it'll probably be a one line code change for most clients.

For the timeline, I think we'd like to get the new version of the line protocol into the 0.13 or 0.14 release and both line protocol versions would be supported for a couple releases to allow a smooth transition.

Answer 17 · 2016-04-05T00:59:14.000Z

I was playing devils advocate. Looks like a good migration strategy.. Like how the influxdata devs are not afraid to nip theses things in the bud /early/.

Answer 18 · 2016-04-11T20:56:52.000Z

We talked about this as a group last week and decided to go ahead and roll forward with this for v0.13.0. As a reminder, this is only applicable when a precision isn't specified. In other words, the specified precision will always be used, but in the absence of that, we'll try to intelligently guess the precision based on timestamp magnitude.

Answer 19 · 2017-01-13T15:27:12.000Z

If a precision is not set during write, will it truncate a timestamp to 10 digits? I'm seeing our node.js client send ms, but we lose 3 digits in influxDB. (We aren't specifying precision).

Yet a query like select * from request where time > now() - 30m shows no results. Where as, the padded select * from request where time > now() - 2455 weeks (47 years ~ 1970 😭 ) starts to show results.

Answer 20 · 2017-01-13T15:33:32.000Z

@shaunwarman if you aren't specifying precision then InfluxDB thinks you are sending nanoseconds, which is why your metrics are close to the epoch.

Answer 21 · 2017-01-13T15:36:34.000Z

thanks @sparrc added the precision flag!

Answer 22 · 2019-07-23T20:33:53.000Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Answer 23 · 2019-07-30T20:42:21.000Z

This issue has been automatically closed because it has not had recent activity. Please reopen if this issue is still important to you. Thank you for your contributions.