Problems storing both numeric and string data within a single channel
Opened this issue · 0 comments
Problem Description
I'm not sure whether this is even supposed to work, but storing both numeric and string data within a single channel results in three issues:
- Doing an export on such a channel will only return the numeric data. However, doing a tile request returns both.
- Inserting a sample with string data for a timestamp which has no numeric data causes a numeric data value to be inserted as well, apparently from the neighboring timestamp.
- Doing a gettile for a data sample at a time that has a string value but no numeric value returns 0 for the value. One would expect
null
instead.
Before illustrating the steps to reproduce, assume we have the following two JSON data files to insert into the datastore:
data1.json
{
"channel_names" : ["foo"],
"data" : [
[1450000001, 10],
[1450000002, 20],
[1450000004, 40],
[1450000005, 50],
[1450000006, 60],
[1450000007, 70],
[1450000008, 80],
[1450000009, 90]
]
}
data2.json
{
"channel_names" : ["foo"],
"data" : [
[1450000003, "thirty"],
[1450000007, "seventy"]
]
}
Steps To Reproduce
begin by inserting the first data file:
./bin/import --format json ./data-test 100 cpb_device ./data1.json
It should succeed with the following response:
{
"channel_specs" : {
"foo" : {
"channel_bounds" : {
"max_time" : 1450000009,
"max_value" : 90,
"min_time" : 1450000001,
"min_value" : 10
},
"imported_bounds" : {
"max_time" : 1450000009,
"max_value" : 90,
"min_time" : 1450000001,
"min_value" : 10
}
}
},
"failed_records" : 0,
"max_time" : 1450000009,
"min_time" : 1450000001,
"successful_records" : 1
}
Now do an export to verify:
./bin/export --csv ./data-test 100.cpb_device.foo
It should print the following:
EpochTime,100.cpb_device.foo
1450000001,10
1450000002,20
1450000004,40
1450000005,50
1450000006,60
1450000007,70
1450000008,80
1450000009,90
Also verify by requesting a tile:
./bin/gettile ./data-test 100 cpb_device.foo -5 90625000
You should get the following:
{
"data" : [
[1450000001, 10, 0, 1],
[1450000002, 20, 0, 1],
[1450000004, 40, 0, 1],
[1450000005, 50, 0, 1],
[1450000006, 60, 0, 1],
[1450000007, 70, 0, 1],
[1450000008, 80, 0, 1],
[1450000009, 90, 0, 1],
[1450000012.5, -1e308, 0, 0]
],
"fields" : ["time", "mean", "stddev", "count"],
"level" : -5,
"offset" : 90625000
}
So far, so good. Now, insert the second data file. Note that this data file contains two string values, one at time 1450000003
and another at time 1450000007
. Looking at data1.json
, we see that there's no existing numeric data value for this channel at time 1450000003
, but there is one (70) for time 1450000007
.
./bin/import --format json ./data-test 100 cpb_device ./data2.json
It should succeed with the following response:
{
"channel_specs" : {
"foo" : {
"channel_bounds" : {
"max_time" : 1450000009,
"max_value" : 90,
"min_time" : 1450000001,
"min_value" : 10
},
"imported_bounds" : {
"max_time" : 1450000007,
"min_time" : 1450000003
}
}
},
"failed_records" : 0,
"max_time" : 1450000007,
"min_time" : 1450000003,
"successful_records" : 1
}
Now do another export to verify:
./bin/export --csv ./data-test 100.cpb_device.foo
EpochTime,100.cpb_device.foo
1450000001,10
1450000002,20
1450000003,40
1450000004,40
1450000005,50
1450000006,60
1450000007,70
1450000008,80
1450000009,90
So, there are the first two problems: no string values are getting exported at all and a numeric value (which we never inserted) is getting returned at time 1450000003
. There's of course the question of how to report 2 values for a single timestamp (as with time 1450000007
), but that's more of an implementation detail. Regardless, one would expect to at least see a string value for time 1450000003
.
Now do a gettile to see the difference:
./bin/gettile ./data-test 100 cpb_device.foo -5 90625000
{
"data" : [
[1450000001, 10, 0, 1, null],
[1450000002, 20, 0, 1, null],
[1450000002.5, -1e308, 0, 0, null],
[1450000003, 0, 0, 1, "thirty"],
[1450000003.5, -1e308, 0, 0, null],
[1450000004, 40, 0, 1, null],
[1450000005, 50, 0, 1, null],
[1450000006, 60, 0, 1, null],
[1450000007, 70, 0, 1, "seventy"],
[1450000008, 80, 0, 1, null],
[1450000009, 90, 0, 1, null],
[1450000012.5, -1e308, 0, 0, null]
],
"fields" : ["time", "mean", "stddev", "count", "comment"],
"level" : -5,
"offset" : 90625000
}
For gettile, we get the string values, but notice that the value at time 1450000003
is no longer being reported as 40--it's now 0. One would expect a null
value, just like there are null
values for the comment fields.