/octopus

The Octopus is part of our network automation pipeline. Its job is to aggregate data from different authoritative data sources into one Enriched Topology to provide a full picture to downstream systems.

Primary LanguageGoApache License 2.0Apache-2.0

Octopus

The Octopus is part of our network automation pipeline. Its job is to aggregate data from different authoritative data sources into one Enriched Topology to provide a full picture to downstream systems.

The enriched topology is a graph of the entire network including all devices as well as connections like cables, circuits, etc. The nodes of the graph represent the devices -- routers, switches, optical equipment, servers, etc. -- including their meta data like status, device type, platform, role, and any custom fields or tags applied in NetBox for example. The edges within the graph are direct cables (DACs, copper/fiber patches), cable paths (e.g. through patch panels or optical equipment), circuits (e.g. dark fibers, wawe lengths, transports, etc.), or even could be serial connections.

Connectors

Each connector (think tentacle) of the Octopus taps into one of our data sources and consumes the bits we are interested in. It is responsible for querying the data, caching it locally, and updating the data in an interval meaningful for the data source and obtaining useful triggers, if any.

Topology generation

The Octopus holds the global Topology.

To gather data from all Connectors it will pass a pointer to a (single) new Topology object into each Connector, which will add its insight into relevant parts of the Topology. If devices, interfaces of devices, or other attributes are missing in the Topology, it is the Connectors responsible to add them.

Open questions

Should we just regenerate the Topology on every run (time based, trigger based, or both?) or should each Connector know (and therefore have the responsibility to figure out) if it has new data since the last run, so the Octopus can query all Connectors and the Topology only needs to be updates if at least one Connector has need data?

Observability

The Octopus exposes a number of metrics via an HTTP endpoint ready to be scraped by Prometheus.

  • octopus_topology_update_duration - Time it took to build the topology (milliseconds)
  • octopus_topology_build_time - Timestamp (epoch) when the current topology was build
  • octopus_topology_item_count - The number of instances per item (broken out bylabel item_type)
  • octopus_connector_health - Connector health indicatior (0/1) (broken out bylabel connector)
  • octopus_connector_load_duraton - Timestamp (epoch) when the current connector data was fetched (broken out by label connector)
  • octopus_connector_load_time - Time it took to fetch data (milliseconds) (broken out by label connector)
  • octopus_connector_update_error_count - The number of time the refresh of connector data has failed (broken out by label connector)

Querying data

The Octopus exposes a gRPC API to query the enriched topology data.

You can manually query the enriched topology using grpcurl from the gRPC endpoint. Be aware that you need to increase the message size if the topology is larger than the default of 4MB. A call could look like this, querying the bond0 interface if ccr01.pad01

grpcurl -max-msg-sz=100000000 octopus-production.example.com:443 cloudflare.net.octopus.OctopusService.GetTopology | jq '.topology.devices[] | select(.name=="ccr01.pad01") | .interfaces[] | select(.name=="bond0")'