seeing-things/track

Revamp telemetry database schema

Closed this issue · 4 comments

Re-think the InfluxDB schema used for telemetry. In particular:

  • Consider using a dedicated Measurement per channel such that each can have its own unique timestamp. Use the fields within a measurement only for things like azimuth and altitude for a single topocentric position, or x and y for a single camera position.
  • Use tags for metadata that won't be changing frequently or at all during a pass, such as:
    • which class or object the channel originated in
    • if possible, a name or other identifier for the object being tracked

Note that some information is more appropriate for log files than InfluxDB tags. But for this log info to be useful we will need a way to find the log file or the section of log file that corresponds to a given measurement in the database. If each program invocation creates a unique log file perhaps the filename for that log file should be included in a tag. See also #215.

It turns out that this is a bit more difficult to figure out than I first assumed. I started applying some changes to the schema starting in targets.py and have run into the following questions that don't have obvious answers:

  • How to name values?
    • Are long names okay?
    • Is some notion of hierarchy possible?
    • Should names map to the class hierarchy or be independent of it? Seems like it would be best if the schema for telemetry stays fairly stable so tools can be written to compare performance going back a long time; refactoring the code should probably not always trigger a schema change.
    • For example, certain intermediate steps in calculations for the CameraTarget are not easy to name concisely.
  • What timestamp should be used for intermediate values that are derived from raw sensor measurements?
    • For example, CameraTarget uses both mount encoder positions and camera frame data to estimate the target position. Thus the timestamps associated with either of these raw sensor readings can't be used directly since they will not match.
  • How to group all measurements from a single control cycle?
    • Could add the control cycle count as a field on all measurements, but the classes that are generating these values don't have easy access to that information.

The schema design may also motivate some refactoring of how telemetry is passed around in the application. There are several possible approaches:

  • The legacy approach where classes inherit from TelemSource and produce a list of Point objects when get_telem_points() is called by a TelemLogger object. If this approach is taken, each TelemSource class would perhaps need to have a list that gets appended to with new Point objects each time certain processing methods are called, and cleared when get_telem_points() is called (with synchronization if needed for thread safety). An awkward situation could arise if get_telem_points() is never called...in that case the list of Points would grow indefinitely and we would have a memory leak. This thought experiment also highlights the reason asynchronous telemetry polling doesn't really make sense for these objects anyway: The same points will be gathered and ultimately written to the database no matter when the TelemLogger object polls the sources.
  • Each class that previously would have inherited from TelemSource instead is passed a reference to an instance of a TelemLogger object. Whenever appropriate it creates Point objects and immediately posts them to the database. This seems the most straightforward. The main downside perhaps is that certain useful metadata like the cycle count may not be readily available to all objects.
  • Only the Tracker object holds a reference to the TelemLogger object. Other classes construct Point objects but pass those to Tracker, which can then tag them with additional metadata and write them to the database.
  • Only the Tracker object holds a reference to the TelemLogger object, but additionally only Tracker constructs Point objects. The Tracker object uses accessor methods and other means to extract the relevant values from other classes to add them to telemetry. The major downside here is that many intermediate values that are useful in telemetry are typically local variables. These would need to be passed back as return values to method calls or stored as member variables so they can be accessed later.

I have enumerated all of the existing telemetry channels in this spreadsheet, along with some proposed changes to the measurement and field names: https://docs.google.com/spreadsheets/d/1JSZjRjXow2QLaunR7kLu93RnbNDEC7nGcem4c45ICqs/edit?usp=sharing

This exercise reminded me that some of the existing telemetry channels may not be essential to keep, and there may be other information that is not captured in telemetry that should be. Ideally telemetry would include enough information such that any values not present in telemetry could be computed afterward using the same source code. As an extension, it should then be possible to reconstruct an entire program run in simulation where values that are normally read from sensors are instead read from telemetry. I think a first pass at refining the list of items included is warranted for now, but any work to prove that the list is complete enough to fully reconstruct a pass in simulation is out of scope since the infrastructure for doing this doesn't exist.

Minimum set of data to reconstruct a pass, I think:

  • All mount encoder readings used in the program
  • All mount slew rates read from the mount driver
  • All raw camera frames or the target positions detected in those frames
  • All timestamps passed to the package that gets positions from the TLE
  • Probably some additional state from the model predictive controller

Also needed, but more appropriate for log/config files:

  • Source code (Git commit hash + diff of any uncommitted work, assuming on master)
  • The exact TLE
  • The exact mount model parameters
  • Observer position (lat/lon/elevation)
  • Full set of program arguments

I think we already have many of these in telemetry. I'm not certain that the mount positions used in telemetry are actually the same ones used in important calculations (e.g., the position may be queried separately for telemetry). I'm also pretty sure the timestamp passed to PyEphem is not recorded anywhere.

The redesign of the schema is complete. I erred on the side of dropping some channels that don't have clear present or future value on the assumption that if they turn out to be important it shouldn't be too hard to restore them.