FeatureTypes and Array Representations: Strategy for Tackling #845
Closed this issue · 6 comments
This issue is meant as a discussion/inquiry.
Investigating recent issue #845 has led to looking intently at the featureType
discovery code in the CC. Currently, the checker tries to classify each variable as one of several feature types:
compliance-checker/compliance_checker/cfutil.py
Lines 1715 to 1765 in e49265d
However, since CF-1.6, there are only six feature types:
- point
- timeSeries
- profile
- timeSeriesProfile
- trajectory
- trajectoryProfile
It seems that the feature types have become entangled with the grid mappings and specifications of section 5.
My question: is it possible to pare down the featureType checks to only the six specified? While doing so, would that facilitate an easier way to deal with the actual array representation checks?
Further re-reading Appendix H indicates examples of featureTypes being represented:
- point: degenerate case of all four array representations
- timeSeries: all four representations
- profile: all four representations
- trajectory: multidimensional -- if the number of trajectories is the same for each station this would then be "orthogonal multidimensional", otherwise "incomplete"; both ragged array representations also valid
- timeSeriesProfile: orthogonal multidimensional only if same number of times for each feature and same number of elements per profile feature, otherwise incomplete multidimensional; contiguous ragged array for profiles and indexed ragged array for organizing profiles into time series
- trajectoryProfile: orthogonal multidimensional if the same number of trajectories per station and same number of depths per profile, otherwise incomplete multidimensional; contiguous ragged array for profiles and indexed ragged array for organizing profiles along trajectories (the profile data is written all at once, and multiple trajectories are being streamed in one after the other)
Since all six featureType classes can be expressed as all four array representations (I think the featureType necessitates the type of representation, right? Not the other way around?) I believe it's possible to thoroughly disambiguate and disentangle the grid mappings, array representations, and featureType discovery and create new, independent routines for each.
Thoughts? @benjwadams
cc @mwengren
I believe it's possible to thoroughly disambiguate and disentangle the grid mappings, array representations, and featureType discovery and create new, independent routines for each.
👍 Thanks @daltonkell This seems like a good idea to me!
Just a bit of confusion I noticed while trying to get my head around these feature types:
trajectory: multidimensional -- if the number of trajectories is the same for each station this would then be "orthogonal multidimensional", otherwise "incomplete"
The concept of stations (inherently fixed points in space) isn't really relevant for trajectories. I think the only way you could use an "orthogonal" representation here is if a collection of trajectories were all sampled at the exact same timestamps (so the obs
dimension can be called time
instead - similar to the orthogonal rep for timeseries). I guess this would be rare, but technically possible.
The concept of stations (inherently fixed points in space) isn't really relevant for trajectories. I think the only way you could use an "orthogonal" representation here is if a collection of trajectories were all sampled at the exact same timestamps (so the
obs
dimension can be calledtime
instead - similar to the orthogonal rep for timeseries). I guess this would be rare, but technically possible.
You're right @mhidas, I think I duplicated timeseries when writing this. From the spec: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#_multidimensional_array_representation_of_trajectories
When storing multiple trajectories in the same file, and the number of elements in each trajectory is the same, one can use the multidimensional array representation. This representation also allows one to have a variable number of elements in different trajectories, at the cost of some wasted space. In that case, any unused elements of the data and auxiliary coordinate variables must contain missing data values (section 9.6).
CC @mylesmc123
If I had to make an educated guess, I would say that the remainder of the feature types probably came from the NOAA NCEI templates here: https://www.nodc.noaa.gov/data/formats/netcdf/v2.0/ .
Closing after merge of #858.