Create a command line program that calculates statistics from humidity sensor data.
The sensors are in a network, and they are divided into groups. Each sensor submits its data to its group leader. Each leader produces a daily report file for a group. The network periodically re-balances itself, so the sensors could change the group assignment over time, and their measurements can be reported by different leaders. The program should help spot sensors with highest average humidity.
- Program takes one argument: a path to directory
- Directory contains many CSV files (*.csv), each with a daily report from one group leader
- Format of the file: 1 header line + many lines with measurements
- Measurement line has sensor id and the humidity value
- Humidity value is integer in range
[0, 100]
orNaN
(failed measurement) - The measurements for the same sensor id can be in the different files Se
leader-1.csv
sensor-id,humidityB
s1,10
s2,88
s1,NaN
leader-2.csv
sensor-id,humidity
s2,80
s3,NaN
s2,78
s1,98
- Program prints statistics to StdOut
- It reports how many files it processed
- It reports how many measurements it processed
- It reports how many measurements failed
- For each sensor it calculates min/avg/max humidity
NaN
values are ignored from min/avg/max- Sensors with only
NaN
measurements have min/avg/max asNaN/NaN/NaN
- Program sorts sensors by highest avg humidity (
NaN
values go last)
Num of processed files: 2
Num of processed measurements: 7
Num of failed measurements: 2
Sensors with highest avg humidity:
sensor-id,min,avg,max
s2,78,82,88
s1,10,54,98
s3,NaN,NaN,NaN
- Single daily report file can be very large, and can exceed program memory
- You can use any Open Source library
- Program should only use memory for its internal state (no disk, no database)
- Sensible tests are welcomed