/gglsbl-rest

Dockerized REST service to look up URLs in Google Safe Browsing v4 API

Primary LanguageShellApache License 2.0Apache-2.0

docker stars docker pulls docker build status

gglsbl-rest

This repository implements a Dockerized REST service to look up URLs in Google Safe Browsing v4 API based on gglsbl using Flask and gunicorn.

Basic Design

The main challenge with running gglsbl in a REST service is that the process of updating the local sqlite database takes several minutes. Plus, the sqlite database is locked during writes by default, so that would essentially cause very noticeable downtime or a race condition.

So what gglsbl-rest does since version 1.4.0 is to set the sqlite database to write-ahead logging mode so that readers and writers can work concurrently. A cron job runs every 30 minutes to update the database and then performs a full checkpoint to ensure readers have optimal performance.

Versions before 1.4.0 maintained two sets of files on disk and switched between them, which is why the status endpoint has the output format lists "alternatives". But the current approach has many advantages, as it reuses fresh downloaded data across updates and cached full hash data.

For security reasons, even though crond is run as root, both the background task of updating the database and the gunicorn process are executed as a non-root user called gglsbl.

Environment Variables

The configuration of the REST service can be done using the following environment variables:

  • GSB_API_KEY is required and should contain your Google Safe Browsing v4 API key.

  • WORKERS controls how many gunicorn workers to instantiate. Defaults to 8 times the number of detected cores plus one.

  • TIMEOUT controls how many seconds before gunicorn times out on a request. Defaults to 120.

  • MAX_REQUESTS controls how many requests a worker can server before it is restarted, as per the max_requests gunicorn setting. Defaults to restarting workers after they serve 16,384 requests.

  • LIMIT_REQUEST_LINE controls the maximum size of the HTTP request line (operation, protocol version, URI and query parameters), as per the limit_request_line gunicorn setting. Defaults to 8190, set to 0 to allow any length.

  • KEEPALIVE controls how long a persistent connection can be idle before it is closed, as per the keepalive gunicorn setting. Defaults to 60 seconds.

  • MAX_RETRIES controls how many times the service should retry performing the request if an error occurs. Defaults to 3.

  • HTTPS_PROXY sets the proxy URL if the service is running behind a proxy. Not set by default. (HTTP_PROXY is not necessary as gglsbl-rest only queries HTTPS URLs)

Running

You can run the latest automated build from Docker Hub as follows:

docker run -e GSB_API_KEY=<your API key> -p 127.0.0.1:5000:5000 mlsecproject/gglsbl-rest 

This will cause the service to listen on port 5000 of the host machine. Please realize that when the service first starts it downloads a new local partial hash database from scratch before starting the REST service. So it might take several minutes to become available.

You can run docker logs --follow <container name/ID> to tail the output and determine when the gunicorn workers start, if necessary.

In production, you might want to mount /home/gglsbl/db in a tmpfs RAM disk for improved performance. Recommended size is 4+ gigabytes, which is roughly twice of a freshly initialized database, but YMMV.

Querying the REST Service

The REST service will respond to queries for /gglsbl/v1/lookup/<URL>. Make sure you percent encode the URL you are querying. If no sign of maliciousness is found, the service will return with a 404 status. Otherwise, a 200 response with a JSON body is returned to describe it.

Here's an example query and response:

$ curl "http://127.0.0.1:5000/gglsbl/v1/lookup/http%3A%2F%2Ftestsafebrowsing.appspot.com%2Fapiv4%2FANY_PLATFORM%2FSOCIAL_ENGINEERING%2FURL%2F"
{
  "matches": [
    {
      "platform": "ANY_PLATFORM",
      "threat": "SOCIAL_ENGINEERING",
      "threat_entry": "URL"
    },
    {
      "platform": "WINDOWS",
      "threat": "SOCIAL_ENGINEERING",
      "threat_entry": "URL"
    },
    {
      "platform": "CHROME",
      "threat": "SOCIAL_ENGINEERING",
      "threat_entry": "URL"
    },
    {
      "platform": "LINUX",
      "threat": "SOCIAL_ENGINEERING",
      "threat_entry": "URL"
    },
    {
      "platform": "ALL_PLATFORMS",
      "threat": "SOCIAL_ENGINEERING",
      "threat_entry": "URL"
    }
  ],
  "url": "http://testsafebrowsing.appspot.com/apiv4/ANY_PLATFORM/SOCIAL_ENGINEERING/URL/"
}

There' an additional /gglsbl/v1/status URL that you can access to check if the service is running and also get some indication of how old the current sqlite database is:

$ curl "http://127.0.0.1:5000/gglsbl/v1/status"
{
  "alternatives": [
    {
      "active": true,
      "ctime": "2017-10-30T20:20:55+0000", 
      "mtime": "2017-10-30T20:20:55+0000", 
      "name": "/home/gglsbl/db/sqlite.db", 
      "size": 2079985664
    }
  ], 
  "environment": "prod"
}

Who's using gglsbl-rest

If your project or company are using gglsbl-rest and you would like them to be listed here, please open a GitHub issue and we'll include you.