
This system requires Docker compose to start up the infrastructure.

Versions Required:

Docker Engine: 18.02.0+

Docker Compose: 1.21.1

These can be obtained from Docker


To run the system with full logging

export DOCKER_KAFKA_HOST=$(ipconfig getifaddr en0)
docker-compose up --scale processor=8

To run the system in the background:

export DOCKER_KAFKA_HOST=$(ipconfig getifaddr en0)
docker-compose up --scale processor=8 -d

Accessing background process logs

You can access logs for the processor using the following:

docker-compose logs -f processor

You can access all logs with the following:

docker-compose logs -f

To submit the files and perform the searches you will need to use the helper files which will require some dependencies to be installed on your submitting machine.

pip3 install -r processor/requirements.txt

Now you can submit the file or files using the helper tool.

python3 /path/to/file/or/directory/or/file/including/paths/to/other/files

You can query the database using the helper tool.

python3 -name name

python3 --lat 100 --lon 100 --radius 100 # in miles

Pagination for search can be done using the additional --start and --size options. Defaults are 0 and 20, respectively.


 $ python3 --name 'san francisco' --start 50 --size 5
[            root] INFO        2018-05-21 12:18:19 Name Search:
[            root] INFO         2018-05-21 12:18:19 [
    "latitude": 19.34564,
    "country": "MX",
    "longitude": -98.86034,
    "shape": "AQAAAH8w8Nx7WDNAM9yAzw+3WMA=",
    "name": "San Francisco Acuautla",
    "admin_2": "039",
    "admin_1": "15",
    "search_location": {
      "lat": 19.34564,
      "lon": -98.86034
    "latitude": 20.55254,
    "country": "MX",
    "longitude": -98.00209,
    "shape": "AQAAAFq77UJzjTRAg2kYPiKAWMA=",
    "name": "San Francisco",
    "admin_2": "083",
    "admin_1": "30",
    "search_location": {
      "lat": 20.55254,
      "lon": -98.00209
    "latitude": 20.65082,
    "country": "MX",
    "longitude": -98.57522,
    "shape": "AQAAAC2VtyOcpjRAVACMZ9CkWMA=",
    "name": "Tlahuelompa (San Francisco Tlahuelompa)",
    "admin_2": "081",
    "admin_1": "13",
    "search_location": {
      "lat": 20.65082,
      "lon": -98.57522
    "latitude": 19.44279,
    "country": "MX",
    "longitude": -99.34398,
    "shape": "AQAAAO/+eK9acTNAmZ6wxAPWWMA=",
    "name": "San Francisco Chimalpa",
    "admin_2": "",
    "admin_1": "17",
    "search_location": {
      "lat": 19.44279,
      "lon": -99.34398
    "latitude": 19.28333,
    "country": "MX",
    "longitude": -99.80917,
    "shape": "AQAAAMb5m1CISDNA4Ln3cMnzWMA=",
    "name": "Loma de San Francisco",
    "admin_2": "118",
    "admin_1": "15",
    "search_location": {
      "lat": 19.28333,
      "lon": -99.80917
[            root] INFO         2018-05-21 12:18:19 Starting at 50, displaying 5 of 114

$ python3 --start 0 --size 10 --lon -122.419 --lat 37.7749 --radius 5
[            root] INFO        2018-05-21 12:18:57 Location Search:
[            root] INFO         2018-05-21 12:18:57 [
    "latitude": 37.7966,
    "country": "US",
    "longitude": -122.40858,
    "shape": "AQAAAC7/If325UJALnO6LCaaXsA=",
    "name": "Chinatown",
    "admin_2": "075",
    "admin_1": "CA",
    "search_location": {
      "lat": 37.7966,
      "lon": -122.40858
    "latitude": 37.71715,
    "country": "US",
    "longitude": -122.40433,
    "shape": "AQAAAMcpOpLL20JAq7LviuCZXsA=",
    "name": "Visitacion Valley",
    "admin_2": "075",
    "admin_1": "CA",
    "search_location": {
      "lat": 37.71715,
      "lon": -122.40433
    "latitude": 37.75018,
    "country": "US",
    "longitude": -122.43369,
    "shape": "AQAAAIAO8+UF4EJAi6azk8GbXsA=",
    "name": "Noe Valley",
    "admin_2": "075",
    "admin_1": "CA",
    "search_location": {
      "lat": 37.75018,
      "lon": -122.43369
    "latitude": 37.77493,
    "country": "US",
    "longitude": -122.41942,
    "shape": "AQAAADpY/+cw40JAdNL7xteaXsA=",
    "name": "San Francisco",
    "admin_2": "075",
    "admin_1": "CA",
    "search_location": {
      "lat": 37.77493,
      "lon": -122.41942
    "latitude": 37.75993,
    "country": "US",
    "longitude": -122.41914,
    "shape": "AQAAAOif4GJF4UJAghyUMNOaXsA=",
    "name": "Mission District",
    "admin_2": "075",
    "admin_1": "CA",
    "search_location": {
      "lat": 37.75993,
      "lon": -122.41914
[            root] INFO         2018-05-21 12:18:57 Starting at 0, displaying 5 of 5


If you have an issue with kafka or elasticsearch when restarting run the following:

docker-compose rm -fs # to remove and kill all containers
rm -rf volumes  # to remove saved data

Then rerun the docker-compose up command.


Using the --scale processor=8 command in docker-compose will spawn 8 processes in a consumer group that will share the load of reading off of Kafka. Obviously the way that the docker-compose is set up right now the bottle neck is in kafka and elasticsearch which are both single node instances.

docker-compose isn't the best platform to scale kafka and elasticsearch, however, so this system still has the same linear "clock" time. A more robust solution would shift to use kubernetes in order to better manage the scaling of kafka and elasticsearch.

docker-compose does give a very clear understanding of how the system might work in production and is a quick tool to set up the environment for devs.

Without having scaling both elasticsearch and kafka we are also susceptible to lost data on outage which isn't ideal.

All of this being said, given a sufficiently well provisioned kafka cluster and elasticsearch cluster I'm pretty confident that this solution would work well.