Environment Sensor Data Processor

I’m used to working in projects where the system receives tons of data (readings) from various environment sensors and then decide to act, depending on what it is receiving.

Since I have never worked with gRPC, I decided to create this project. The idea is simple: Use gRPC and create a basic system that could digest lots of environment sensor data at once (and scale out if necessary). The messages can be received, read and processed directly using a gRPC client or via the Flask API (which, in turn, use the gRPC client).

Some aspects of this project (like validation and what to do with the readings received) are quite generic, since the system is not grounded in any real world case. I plan on setting up a couple of sensors with an Arduino (or a Raspberry Pi) and make it send the readings to the API.

Why Python, Flask, gRPC and MongoDb?

Python

My programming language of choice. Easy to learn, fast to code and help is just a DuckDuckGo (or Google) search away.

Flask

In my opinion, Flask is way more flexible, easy to learn and lightweight than Django. Taking that in consideration, along with the scope of this project, it makes more sense to choose Flask over Django.

gRPC

I've heard some great things about gRPC, but never tried it. As far as I researched it's a great option for a low latency, high scalability (language agnostic) solution. It adds a couple layers of complexity to the application (when we compare to not using any type of RPC at all), but I'm believing it is worth it, specially considering load balancing and application evolution.

MongoDb

Initially, I chose ZODB, so I could use an embedded, fast database. However, I started to notice some erratic behaviours when doing bigger benchmark tests. So I decided to give up on the embedded requirement from storage and went on to MongoDb, which is also fast, easy to implement and scalable.

Big picture

Clients, services, private apis and other clients can access the gRPC endpoints.
Meanwhile, a public API can be used (by a frontend or other third party applications), to make use of gRPC`s endpoint.

Entities

In its current state, this project has 3 entities:

Location: Not all that useful. Not used for anything, really. Included because it makes sense, even in a basic setting for this type of thing.
Sensor: Represents the physical sensor that's (hypothetically) sending environment readings.
Sensor Reading: The actual reading from the sensor. Contains a lot of data.

The current version of Sensor Reading contains both the id of its sensor and the id of its location. Normally, the reading would not have both, but I decided to include it, so I could make a few more validations on each reading.

Also, all entities have read-only properties and validation on setters, to make sure everything is has it should be.

Performance

Write test - Requests to the API

Since this would really depend on the implementation of the API and since running Flask locally is not the best way to test simultaneous/parallel requests, I'll just focus the tests on direct calls to the server.

Write test - Direct calls to gRPC server

Workers	Requests per Client	Total Requests	Elapsed Time	Cap: Req/Seconds	Cap: Req/Day
1	10	10	0:00:00.636513	15	1.296.000
10	10	100	0:00:00.794216	125	10.800.000
10	100	1000	0:00:02.770565	360	31.104.000
50	200	10000	0:00:23.811725	420	36.288.000
100	100	10000	0:00:22.055141	453	39.139.200
200	50	10000	0:00:22.355107	447	38.620.800
1	10000	10000	0:00:51.448303	194	16.761.600
10000	1	10000	0:00:27.520348	363	31.363.200

Columns:

Workers: number of active workers making requests simultaneously.
Requests per Client: number of requests each worker made.
Total Requests: total number of requests processed.
Elapsed Time: total elapsed time for the test.
Cap: Req/Seconds: in the current workload, how many messages were processed each second.
Cap: Req/Day: given this ratio of messages per client, how many messages could be processed each day.

Read test - Direct calls to gRPC server

Workers	Requests per Client	Limit Results to	Total Fetched	Elapsed Time	Rows fetched/sec
1	10	10000	100,000	0:00:11.031695	9090
10	10	10000	1,000,000	0:00:52.143278	19,230
10	100	10000	10,000,000	0:08:50.079260	18,867
10	100	first	1,000	0:00:01.490819	1,000
10	100	last	1,000	0:00:01.527391	1,000
10	100	100 newest	100,000	0:00:07.041533	14,285
10	100	100 oldest	1,000,000	0:00:07.155739	14,285
100	100	100 newest	1,000,000	0:01:05.692690	15,384
100	100	100 oldest	1,000,000	0:01:06.817975	15,151
100	1000	first	100,000	0:01:36.958435	1,041
100	1000	last	100,000	0:01:35.372245	1,052
100	1000	single row, by id	100,000	0:01:38.419700	1,020
1000	100	single row, by id	100,000	0:01:46.748245	943
10000	10	single row, by id	100,000	0:01:46.222926	943