Argus

Argus is a platform for aggregating incidents across network management systems, and sending notifications to users. Users build notification profiles that define which incidents they subscribe to.

This repository hosts the backend built with Django, while the frontend is hosted here: https://github.com/Uninett/Argus-frontend.

Setup

Prerequisites

Python 3.7+
pip

Dataporten setup

Register a new application with the following redirect URL: {server_url}/oidc/complete/dataporten_feide/
- {server_url} must be replaced with the URL to the server running this project, like http://localhost:8000
Add the following permission scopes:
- profile
- userid
- userid-feide

Project setup

Create a Python 3.7+ virtual environment
pip install -r requirements.txt
python manage.py migrate
python manage.py initial_setup

Start the server with python manage.py runserver.

Alternative setup using Docker Compose

docker-compose up
docker-compose exec argus-api django-admin initial_setup
Visit http://localhost:8000/

Site- and deployment-specific settings

Site-specific settings are set as per 12 factor, with environment variables. For more details, see the relevant section in the docs: Setting site-specific settings.

A recap of the environment variables that can be set by default follows.

Environment variables

ARGUS_DATAPORTEN_KEY, which holds the id/key for using dataporten for authentication.
ARGUS_DATAPORTEN_SECRET, which holds the password for using dataporten for authentication.
ARGUS_COOKIE_DOMAIN, the domain the cookie is set for
ARGUS_FRONTEND_URL, for redirecting back to frontend after logging in through Feide, and also CORS. Must either be a subdomain of or the same as ARGUS_COOKIE_DOMAIN
ARGUS_SEND_NOTIFICATIONS, True in production and False by default, to allow supressing notifications
DEBUG, 1 for True, 0 for False
TEMPLATE_DEBUG. By default set to the same as DEBUG.
DEFAULT_FROM_EMAIL, the email From-address used for notifications sent via email
EMAIL_HOST, smarthost (domain name) to send email through
EMAIL_HOST_USER, (optional) if the host in EMAIL_HOST needs authentication
EMAIL_HOST_PASSWORD, (optional) password if the smarthost needs that
EMAIL_PORT, in production by default set to 587
SECRET_KEY, used internally by django, should be about 50 chars of ascii noise (but avoid backspaces!)

There are also settings (not env-variables) for which notification plugins to use:

DEFAULT_SMS_MEDIA, which by default is unset, since there is no standardized way of sending SMSes. See Notifications and notification plugins.

DEFAULT_EMAIL_MEDIA, which is included and uses Django's email backend. It is better to switch out the email backend than replcaing this plugin.

A Gmail account with "Allow less secure apps" turned on, was used in the development of this project.

Production gotchas

The frontend and backend currently needs to be on either the same domain or be subdomains of the same domain (ARGUS_COOKIE_DOMAIN).

When running on localhost for dev and test, ARGUS_COOKIE_DOMAIN may be empty.

Running tests

python manage.py test src

Mock data

Generating

PYTHONPATH=src python src/argus/incident/fixtures/generate_fixtures.py

This creates the file src/argus/incident/fixtures/incident/mock_data.json.

Loading

python manage.py loaddata incident/mock_data

Running in development

The fastest is to use virtualenv or virtaulenvwrapper or similar to create a safe place to stash all the dependencies.

Create the virtualenv
Fill the activated virtualenv with dependencies:

$ pip install -r requirements/prod.txt
$ pip install -r requirements/dev.txt

Copy the cmd.sh-template to a new name ending with ".sh", make it executable and set the environment variables within. This file must not be checked in to version control, since it contains passwords. You must set DATABASE_URL, DJANGO_SETTINGS_MODULE and SECRET_KEY. If you want to test the frontend you must also set all the DATAPORTEN-settings. Get the values from https://dashboard.dataporten.no/ or create a new application there.

For the database we recommend postgres as we use a postgres-specific feature in the Incident-model.

DJANGO_SETTINGS_MODULE can be set to "argus.site.settings.dev" but we recommend having a localsettings.py in the same directory as manage.py with any overrides. This file also does not belong in version control since it reflects a specific developer's preferences. Smart things first tested in a localsettings can be moved to the other settings-files later on. If you copy the entire logging-setup from "argus.site.settings.dev" to "localsettings.py" remember to set "disable_existing_loggers" to True or logentries will occur twice.

Debugging tips

To test/debug notifications as a whole, use the email subsystem (Media: Email in a NotificationProfile). Set EMAIL_HOST to "localhost", EMAIL_PORT to "1025", and run a dummy mailserver:

$ python3 -m smtpd -n -c DebuggingServer localhost:1025

Notifications sent will then be dumped to the console where the dummy server runs.

Endpoints

/admin/ to access the project's admin pages.

All endpoints require requests to contain a header with key Authorization and value Token {token}, where {token} is replaced by a registered auth token; these are generated per user by logging in through Feide, and can be found at /admin/authtoken/token/.

Auth endpoints

GET to /api/v1/auth/user/: returns the logged in user
GET to /api/v1/auth/users/<int:pk>/: returns a user by PK
POST to /oidc/api-token-auth/: returns an auth token for the posted user
- Note that this token will expire after 14 days, and can be replaced by posting to the same endpoint.
- Example request body: { username: <username>, password: <password> }
/oidc/login/dataporten_feide/: redirects to Feide login

/api/v1/auth/phone-number/:

GET: returns the phone numbers of the logged in user

Example response body:

[
  {
    "pk": 2,
    "user": 1,
    "phone_number": "+4767676767"
  },
  {
    "pk": 1,
    "user": 1,
    "phone_number": "+4790909090"
  }
]

POST: creates and returns the phone numbers of the logged in user
Example request body:
```
{
  "pk": 2,
  "phone_number": "+4767676767"
}
```

/api/v1/auth/phone-number/<int:pk>/:
- GET: returns the specific phone number of the logged in user
  Example response body:
```
{
  "pk": 2,
  "user": 1,
  "phone_number": "+4767676767"
}
```
- PUT: updates and returns one of the logged in user's phone numbers by PK
  - Example request body: same as POST to /api/v1/auth/phone-number/
- DELETE: deletes one of the logged in user's phone numbers by PK
The phone number is validated with a python version of the Google library libphonenumber. It will check that the number is in a valid number series. Using a random number with enough digits that is not in a valid series will not work.

Incident endpoints

/api/v1/incidents/:

GET: returns all incidents - both open and historic

Query parameters:
All query parameters are optional. If a query parameter is not included or empty, for instance `acked=`, then the rows returned are not affected by that filter and shows rows of all kinds of that value, for instance both "acked" and "unacked" in the case of `acked=`.
Filtering parameters:

acked=true|false

Fetch only acked (true) or unacked (false) incidents.

open=true|false

Fetch only open (true) or closed (false) incidents.

stateful=true|false

Fetch only stateful (true) or stateless (false) incidents.

source__id__in=ID1[,ID2,..]

Fetch only incidents with a source with numeric id ID1 or ID2 or..
source__name__in=NAME1[,NAME2,..]

Fetch only incidents with a source with name NAME1 or NAME2 or..
source_incident_id=ID

Fetch only incidents with source_incident_id set to ID.

tags=key1=value1,key1=value2,key2=value

Fetch only incidents with one or more of the tags. Tag-format is "key=value". If there are multiple tags with the same key, only one of the tags need match. If there are multiple keys, one of each key must match.

So: /api/v1/incidents/?acked=false&open=true&stateful&true&source__id__in=1&tags=location=broomcloset,location=understairs,problem=onfire will fetch incidents that are all of "open", "unacked", "stateful", from source number 1, with "location" either "broomcloset" or "understairs", and that is on fire (problem=onfire).

Paginating parameters:

cursor=LONG RANDOM STRING|null

Go to the page of that cursor. The cursor string for next and previous page is part of the response body./dd>
page_size=INTEGER

The number of rows to return. Default is 100.

So: api/v1/incidents/?cursor=cD0yMDIwLTA5LTIzKzEzJTNBMDIlM0ExNi40NTU4MzIlMkIwMCUzQTAw&page_size=10 will go to the page indicated by "cD0yMDIwLTA5LTIzKzEzJTNBMDIlM0ExNi40NTU4MzIlMkIwMCUzQTAw" and show the next 10 rows from that point onward. Do not attempt to guess the cursor string. null means there is no more to fetch.
Example response body:
```
{
    "next": "http://localhost:8000/api/v1/incidents/?cursor=cD0yMDIwLTA5LTIzKzEzJTNBMDIlM0ExNi40NTU4MzIlMkIwMCUzQTAw&page_size=10",
    "previous": null,
    "results": [
        {
            "pk": 10101,
            "start_time": "2011-11-11T11:11:11+02:00",
            "end_time": "2011-11-11T11:11:12+02:00",
            "source": {
                "pk": 11,
                "name": "Uninett GW 3",
                "type": {
                    "name": "nav"
                },
                "user": 12,
                "base_url": "https://somenav.somewhere.com"
            },
            "source_incident_id": "12345",
            "details_url": "https://uninett.no/api/alerts/12345/",
            "description": "Netbox 11 <12345> down.",
            "ticket_url": "https://tickettracker.com/tickets/987654/",
            "tags": [
                {
                    "added_by": 12,
                    "added_time": "2011-11-11T11:11:11.111111+02:00",
                    "tag": "object=Netbox 4"
                },
                {
                    "added_by": 12,
                    "added_time": "2011-11-11T11:11:11.111111+02:00",
                    "tag": "problem_type=boxDown"
                },
                {
                    "added_by": 200,
                    "added_time": "2020-08-10T11:26:14.550951+02:00",
                    "tag": "color=red"
                }
            ],
            "stateful": true,
            "open": false,
            "acked": false
        }
    ]
}
```
Pagination-support:

`next`
The link to the next page, according to the cursor, or `null` if on the last page.

`previous`
The link to the previous page, according to the cursor, or `null` if on the first page.

`results`
An array of the resulting subset of rows, or an empty array if no results.

Refer to this section for an explanation of the other fields.

POST: creates and returns an incident

Example request body:

{
    "source": 11,
    "start_time": "2011-11-11 11:11:11.11111",
    "end_time": null,
    "source_incident_id": "12345",
    "details_url": "https://uninett.no/api/alerts/12345/",
    "description": "Netbox 11 <12345> down.",
    "ticket_url": "https://tickettracker.com/tickets/987654/",
    "tags": [
        {"tag": "object=Netbox 4"},
        {"tag": "problem_type=boxDown"}
    ]
}

Refer to this section for an explanation of the fields.

/api/v1/incidents/<int:pk>/:
- GET: returns an incident by PK
- PATCH: modifies parts of an incident and returns it
  Example request body:
```
{
    "ticket_url": "https://tickettracker.com/tickets/987654/",
    "tags": [
        {"tag": "object=Netbox 4"},
        {"tag": "problem_type=boxDown"}
    ]
}
```
  The fields allowed to be modified are:
  - details_url
  - ticket_url
  - tags
/api/v1/incidents/<int:pk>/ticket_url/:
- PUT: modifies just the ticket url of an incident and returns it
  Example request body:
```
{
    "ticket_url": "https://tickettracker.com/tickets/987654/",
}
```
  Only ticket_url may be modified.

/api/v1/incidents/<int:pk>/events/:

GET: returns all events related to the specified incident

Example response body:

[
    {
        "pk": 1,
        "incident": 10101,
        "actor": {
            "pk": 12,
            "username": "nav.oslo.uninett.no"
        },
        "timestamp": "2011-11-11T11:11:11+02:00",
        "received": "2011-11-11T11:12:11+02:00",
        "type": {
            "value": "STA",
            "display": "Incident start"
        },
        "description": ""
    },
    {
        "pk": 20,
        "incident": 10101,
        "actor": {
            "pk": 12,
            "username": "nav.oslo.uninett.no"
        },
        "timestamp": "2011-11-11T11:11:12+02:00",
        "received": "2011-11-11T11:11:13+02:00",
        "type": {
            "value": "END",
            "display": "Incident end"
        },
        "description": ""
    }
]

Note that `received` is set by argus on reception of an event. Normally,
this should be the same as, or a little later, than `timestamp`. If there
is a large gap (in minutes), or `received` is earlier `timestamp`, it
is likely something wrong with the internal clock either on the argus
server or the event source.

POST: creates and returns an event related to the specified incident
Example request body:
```
{
    "timestamp": "2020-02-20 20:02:20.202021",
    "type": "OTH",
    "description": "The investigation is still ongoing."
}
```
If posted by an end user (a user with no associated source system), the timestamp field is optional, and will be set to the time the server received it if omitted.

The valid types are:
- STA - Incident start
  - An incident automatically creates an event of this type when the incident is created, but cannot have more than one. In other words, it's never allowed to post an event of this type.
- END - Incident end
  - Only source systems can post an event of this type, which is the standard way of closing an indicent. An incident cannot have more than one event of this type.
- CLO - Close
  - Only end users can post an event of this type, which manually closes the incident.
- REO - Reopen
  - Only end users can post an event of this type, which reopens the incident if it's been closed (either manually or by a source system).
- ACK - Acknowledge
  - Use the /api/v1/incidents/<int:pk>/acks/ endpoint.
- OTH - Other
  - Any other type of event, which simply provides information on something that happened related to an incident, without changing its state in any way.

GET to /api/v1/incidents/<int:pk>/events/<int:pk>/: returns a specific event related to the specified incident

/api/v1/incidents/<int:pk>/acks/:

GET: returns all acknowledgements of the specified incident

Example response body:

[
    {
        "pk": 2,
        "event": {
            "pk": 2,
            "incident": 10101,
            "actor": {
                "pk": 140,
                "username": "jp@example.org"
            },
            "timestamp": "2011-11-11T11:11:11.235877+02:00",
            received": "2011-11-11T11:11:11.235897+02:00",
            "type": {
                "value": "ACK",
                "display": "Acknowledge"
            },
            "description": "The incident is being investigated."
        },
        "expiration": "2011-11-13T12:00:00+02:00"
    },
    {
        "pk": 20,
        "event": {
            "pk": 20,
            "incident": 10101,
            "actor": {
                "pk": 130,
                "username": "ferrari.testarossa@example.com"
            },
            "timestamp": "2011-11-12T11:11:11+02:00",
            "received": "2011-11-12T11:11:11+02:00",
            "type": {
                "value": "ACK",
                "display": "Acknowledge"
            },
            "description": "The situation is under control!"
        },
        "expiration": null
    }
]

POST: creates and returns an acknowledgement of the specified incident
Example request body:
```
{
    "event": {
        "timestamp": "2011-11-11 11:11:11.235877",
        "description": "The incident is being investigated."
    },
    "expiration": "2011-11-13 12:00:00"
}
```
Only end users can post acknowledgements.

The timestamp field is optional, and will be set to the time the server received it if omitted.

GET to /api/v1/incidents/<int:pk>/acks/<int:pk>/: returns a specific acknowledgement of the specified incident
GET to /api/v1/incidents/mine/: behaves like /api/v1/incidents/ except only showing the incidents added by the logged-in user, and no filtering on source or source type is possible.
GET to /api/v1/incidents/open/: returns all open incidents
GET to /api/v1/incidents/open+unacked/: returns all open incidents that have not been acked
GET to /api/v1/incidents/metadata/: returns relevant metadata for all incidents

Notification profile endpoints

/api/v1/notificationprofiles/:
- GET: returns the logged in user's notification profiles
- POST: creates and returns a notification profile which is then connected to the logged in user
  Example request body:
```
{
    "timeslot": 1,
    "filters": [
        1,
        2
    ],
    "media": [
        "EM",
        "SM"
    ],
    "phone_number": 1,
    "active": true
}
```
  The phone number field is optional and may also be null.
/api/v1/notificationprofiles/<int:pk>/:
- GET: returns one of the logged in user's notification profiles by PK
- PUT: updates and returns one of the logged in user's notification profiles by PK
  - Note that if timeslot is changed, the notification profile's PK will also change. This consequently means that the URL containing the previous PK will return a 404 Not Found status code.
  - Example request body: same as POST to /api/v1/notificationprofiles/
- DELETE: deletes one of the logged in user's notification profiles by PK
GET to /api/v1/notificationprofiles/<int:pk>/incidents/: returns all incidents - both open and historic - filtered by one of the logged in user's notification profiles by PK

/api/v1/notificationprofiles/timeslots/:

GET: returns the logged in user's time slots

POST: creates and returns a time slot which is then connected to the logged in user

Example request body:

{
    "name": "Weekdays",
    "time_recurrences": [
        {
            "days": [1, 2, 3, 4, 5],
            "start": "08:00:00",
            "end": "12:00:00"
        },
        {
            "days": [1, 2, 3, 4, 5],
            "start": "12:30:00",
            "end": "16:00:00"
        }
    ]
}

The optional key "all_day" indicates that Argus should use Time.min and Time.max as "start" and "end" respectively. This also overrides any provided values for "start" and "end". An example request body:

{
    "name": "All the time",
    "time_recurrences": [
        {
            "days": [1, 2, 3, 4, 5, 6, 7],
            "all_day": true
        }
    ]
}

which would yield the response:

{
    "pk": 2,
    "name": "All the time",
    "time_recurrences": [
        {
            "days": [1, 2, 3, 4, 5, 6, 7],
            "start": "00:00:00",
            "end": "23:59:59.999999",
            "all_day": true
        }
    ]
}

/api/v1/notificationprofiles/timeslots/<int:pk>/:
- GET: returns one of the logged in user's time slots by PK
- PUT: updates and returns one of the logged in user's time slots by PK
  - Example request body: same as POST to /notificationprofiles/timeslots/
- DELETE: deletes one of the logged in user's time slots by PK
/api/v1/notificationprofiles/filters/:
- GET: returns the logged in user's filters
- POST: creates and returns a filter which is then connected to the logged in user
  Example request body:
```
{
    "name": "Critical incidents",
    "filter_string": "{\"sourceSystemIds\": [<SourceSystem.pk>, ...], \"tags\": [\"key1=value1\", ...]}"
}
```
/api/v1/notificationprofiles/filters/<int:pk>/:
- GET: returns one of the logged in user's filters by PK
- PUT: updates and returns one of the logged in user's filters by PK
  - Example request body: same as POST to /api/v1/notificationprofiles/filters/
- DELETE: deletes one of the logged in user's filters by PK
POST to /api/v1/notificationprofiles/filterpreview/: returns all incidents - both open and historic - filtered by the values in the body
Example request body:
```
{
    "sourceSystemIds": [<SourceSystem.pk>, ...]
}
```

Models

Explanation of terms

incident: an unplanned interruption in the source system.
event: something that happened related to an incident.
acknowledgement: an acknowledgement of an incident by a user, which hides the incident from the other open incidents.
- If expiration is an instance of datetime, the incident will be shown again after the expiration time.
- If expiration is null, the acknowledgement will never expire.
- An incident is considered "acked" if it has one or more acknowledgements that have not expired.
start_time: the time the incident was created.
end_time: the time the incident was resolved or closed.
- If null: the incident is stateless.
- If "infinity": the incident is stateful, but has not yet been resolved or closed - i.e. open.
- If an instance of datetime: the incident is stateful, and was resolved or closed at the given time; if it's in the future, the incident is also considered open.
source: the source system that the incident originated in.
object: the most specific object that the incident is about.
parent_object: an object that the object is possibly a part of.
problem_type: the type of problem that the incident is about.
tag: a key-value pair separated by an equality sign (=), in the shape of a string.
- The key can consist of lowercase letters, numbers and underscores.
- The value can consist of any length of any characters.

ER diagram

Notifications and notification plugins

A notification plugin is a class that inherits from argus.notificationprofile.media.base.NotificationMedium. It has a send(incident, user, **kwargs) static method that does the actual sending.

The included argus.notificationprofile.media.email.EmailNotification needs only incident and user, while an SMS medium in addition needs a phone_number. A phone_number is a string that includes the international calling code, see for instance Wikipedia: List of mobile telephone prefixes by country.

katsel/Argus

Argus

Setup

Prerequisites

Dataporten setup

Project setup

Alternative setup using Docker Compose

Site- and deployment-specific settings

Environment variables

Production gotchas

Running tests

Mock data

Generating

Loading

Running in development

Debugging tips

Endpoints

Models

Explanation of terms

ER diagram

Notifications and notification plugins