SensorStatisticsTask

Create a command line program that calculates statistics from humidity sensor data.

Background story

The sensors are in a network, and they are divided into groups. Each sensor submits its data to its group leader. Each leader produces a daily report file for a group. The network periodically re-balances itself, so the sensors could change the group assignment over time, and their measurements can be reported by different leaders. The program should help spot sensors with highest average humidity.

Input

Program takes one argument: a path to directory
Directory contains many CSV files (*.csv), each with a daily report from one group leader
Format of the file: 1 header line + many lines with measurements
Measurement line has sensor id and the humidity value
Humidity value is integer in range [0, 100] or NaN (failed measurement)
The measurements for the same sensor id can be in the different files

Example

leader-1.csv

sensor-id,humidity
s1,10
s2,88
s1,NaN

leader-2.csv

sensor-id,humidity
s2,80
s3,NaN
s2,78
s1,98

Expected Output

Program prints statistics to StdOut
It reports how many files it processed
It reports how many measurements it processed
It reports how many measurements failed
For each sensor it calculates min/avg/max humidity
NaN values are ignored from min/avg/max
Sensors with only NaN measurements have min/avg/max as NaN/NaN/NaN
Program sorts sensors by highest avg humidity (NaN values go last)

Example

Num of processed files: 2
Num of processed measurements: 7
Num of failed measurements: 2

Sensors with highest avg humidity:

sensor-id,min,avg,max
s2,78,82,88
s1,10,54,98
s3,NaN,NaN,NaN

Notes

Single daily report file can be very large, and can exceed program memory
Program should only use memory for its internal state (no disk, no database)
Any open source library can be used (besides Spark)
Please use vanilla scala, akka-stream, monix or similar technology.
You're more than welcome to implement a purely functional solution using cats-effect, fs2 and/or ZIO to impress, but this is not a mandatory requirement.
Sensible tests are welcome

Project Build

The project is built purely in Scala with no external libraries but makes heavy use of java.nio for file management.

Installation

Prequisites

The following should be installed on your system.

Scala
Sbt
Scalafmt

Step1

Clone the repo using the code below:

https://github.com/hkateu/SensorStatisticsTask.git

This will pull all the files and folders onto you local machine.

Step2

Change directory and run the program with sbt:

sbt run

The following should be the output of the program:

   _____                              _____ _        _   _     _   _
  / ____|                            / ____| |      | | (_)   | | (_)
 | (___   ___ _ __  ___  ___  _ __  | (___ | |_ __ _| |_ _ ___| |_ _  ___ ___
  \___ \ / _ | '_ \/ __|/ _ \| '__|  \___ \| __/ _` | __| / __| __| |/ __/ __|
  ____) |  __| | | \__ | (_) | |     ____) | || (_| | |_| \__ | |_| | (__\__ \
 |_____/ \___|_| |_|___/\___/|_|    |_____/ \__\__,_|\__|_|___/\__|_|\___|___/


Number of processed files: 3
Number of processed measurements: 11
Number of failed measurements: 4


Sensors with the highiest avg humidity:


sensor-id, min, ave, max
s2, 36.0, 70.4, 88.0
s1, 10.0, 25.0, 40.0
s3, NaN, NaN, NaN