surfacescan

Attack surface scanner

Run local

Prerequisites

docker
docker-compose
poetry (global or virtualenv)
locust (options, global or virtualenv)

Docker way

Build images

foo@bar:~$ docker-compose build

Run api

foo@bar:~$ docker-compose up -d api

Run lint (flake8)

foo@bar:~$ docker-compose run --rm lint

Run tests (pytest)

foo@bar:~$ docker-compose run --rm tests

To run locust execute

foo@bar:~$ docker-compose up -d locust

Open in the browser http://localhost:8089 and specify http://api:8001 as host to be tested.

Non docker way

Install dependencies

foo@bar:~$ poetry install

Run flake8

foo@bar:~$ flake8

Run tests

foo@bar:~$ pytest

Start service

foo@bar:~$ uvicorn surfacescan.main:app --reload --port 8001

Notes

The requirement "Statistics should be from process startup" makes it hard to scale application by processes and especially nodes. Because in that case all the processes will have differect lifetime. In the same time the calculations for the /attack are CPU intensive so scaling by threads is not the case because of pythons's GIL. We think the best choise would be to update statistics requirements (for example /stats returns accumulated statistics from initial service startup and allows to query it by time ranges). In such case we could accumulate statistisc in statsd, send it to som ebackend and query that backend. There is also an assumption that surface attack building could be executed in process pool (await run_in_executor) and requests could be async, but such approach restricts third party libraries usage because of pickle serialization and we think statistics requirements update would be preferable.
We assume attacks relation is transitive, e.g. if A can attack B and B can attack C then A can attack C too.
In some cases responses returned by "search" endpoints parametrized by query parameters can return 200 status code if nothing was found, but here we return the 404 status code to distinguish cases when a VM is not found and a VM is isolated and its surface attack is empty.
We use FastAPI for this project because of several reasons: Django seems an overhead for such a small project, FastAPI provides rich abbilities for parsing, validating and handling HTTP requests, it has ready test client, API documentation "out of the box", abbility to run the app in different modes (ASGI, WSGI), seems it has good performance, embedded tools for dependencies instantiation.
As FastAPI intensively uses modern python typing we use it too.
We use middleware to track each request, because in this case adding of a new request handler would not require any statistics related work (like registering or decorating a handler).
As we keep statistics as a single instance and FastAPI can handle requests in different threads, statistics should be updated in a thread safe manner. We use atomic counters instead of locks to reduce performance impact, in such cases updating of both counters is not an atomic operation e.g. if we have 2 simultanious requests for scanning and 1 for statistics, /stats can return requests count updated twice and requests duration updated once (but eventially it will be updated twice too). We assume such discrepancies are not significant and allowable.
/stats response does not includes itself -- we cannot track it while it's not finished.

Project structure

A service with 2 simple endpoints could be implemented in a single file but one of major goals of this project is to show our approach for structuring a service application.

surfacescan.main

Application initialization, entry point.

surfacescan.api

Routes, HTTP handlers definition. We tried to keep the approach when a HTTP handler is only responsible for "HTTP related staff" and relies on "business logic" modules to do the main work.

surfacescan.registry

As the statistics is bound to the process lifetime we keep it in memory as a single instance. Obviously the environment data should be kept in memory to avoid reading it on each request. So we created a separate module which is resposible for "instantiating" dependencies required by request handlers.

surfacescan.tracking

Module responsible for statistics accumulation. The middleware just uses the tracking function provided so we can easily change the implementation (for example send data to statsd) and do not touch the middleware itself.

surfacescan.scanning

Module responsible for scanning the attack surface. Test cases described in tests/data_scanning.py illustrate our understanding (or its lack) of the problem.

gendata.py

Allows to generate big (or actually any sizes which can fit memory available) data sets for performance testing purposes.

What would be nice to have but is not implemented due to time restrictions

mypy validation
test surfacescan.tracking.Tracker increments its counters in a thread safe manner
validate data loaded on startup, for example raise error when a duplicated vm_id is found (it would require also additional tests)

vbogretsov/surfacescan