/ZenPacks.zenoss.Layer2

Layer 2 networking infrastructure.

Primary LanguagePython

This ZenPack provides support to model OSI Layer 2 (or data link layer) topology. Than that topology information is used to suppress events from devices connection to which was lost because they are connected to broken devices. Data collection is performed using SNMP.

Table of Contents

Gallery

<gallery widths="250px" heights="127px"> layer2_network_map.png layer2_client_mac_addresses.png layer2_neighbor_switches.png layer2_modeler_plugins.png layer2_configuration_properties.png </gallery>

Features

The features added by this ZenPack can be summarized as follows. They are each detailed further below.

  • Discovery and periodic remodeling of Neighbor Switches using CDP/LLDP.
  • Discovery and periodic remodeling of MAC address or forwarding tables.
  • Event suppression based on discovered forwarding table information.

Discovered Components

Assigning the zenoss.snmp.CDPLLDPDiscover modeler plugin to device(s) will result in SNMP discovery of neighbor switches using a combination of CISCO-CDP-MIB and LLDP-MIB. The discovered neighbor switch information will be shown as Neighbor Switches in the device's component list.

Assigning the zenoss.snmp.ClientMACs modeler plugin to device(s) will result in SNMP discovery of the device's forwarding tables using BRIDGE-MIB. This information will be stored on existing Network Interfaces, and won't result in any new components being created.

Monitoring

This ZenPack performs no monitoring.

Event Suppression

This ZenPack supports two types of event suppression.

  • Suppression of ping failures when one or more upstream ping failures can be identified as the reason for the failure.
  • Suppression of non-ping events on devices with open ping failure events.
We will use the term symptomatic event to refer to events that are a symptom of a problem, but not the root cause.

Ping Event Suppression

Suppression of ping events can be enabled on a per-device or device class basis by enabling the zL2SuppressIfPathsDown configuration property. This mode of suppression requires that the zenoss.snmp.ClientMACs modeler plugin be enabled and successfully functioning on all network devices such as switches and routers that you believe could be a root cause of other monitored devices becoming unreachable.

There are two ways symptomatic ping events can be suppressed. By manually configuring the ultimate gateway(s) of the device(s) using the zL2Gateways property, or by leaving the zL2Gateways property empty and setting the zL2PotentialRootCause property appropriately so that the gateway(s) can be automatically discovered.

Data Center Topology Diagram

The diagram above depicts a common data center network topology where each rack has a redundant pair of access switches sometimes referred to as top-of-rack switches. Each of those top-of-rack switches connect to a redundant pair of end-of-row switches. Each of those end-of-row switches connect to a redundant pair of core switches for the data center. Then perhaps the pair of core switches connect to a pair of gateway routers to connect the data center to the Internet or other data centers over private links. In this kind of topology the layer 3 gateway for hosts is often the core switches.

In this type of topology the gateways for host-1-1-1 can be automatically discovered to be rack-1-1a and rack-1-1b if zL2PotentialRootCause is enabled for the switches, and disabled for the hosts. zL2PotentialRootCause should be enabled for devices that could potentially be a root cause for other devices becoming unreachable, and disabled for devices that cannot be a root cause. This property is important to prevent root caused events from incorrectly being suppressed.

By relying on this automatic discovery of gateways we can only achieve suppression of events from the hosts. We'd get all of the host events suppressed in the case of an entire data center outage, but all of the rack, row, core, and gateway events would remain unsuppressed and it would be left as manual identification of the gateways as the root cause.

To achieve multi-hop suppression the zL2Gateways property must be configured. Despite the name of the property containing "L2", the configured gateways need not be restricted to the layer 2 gateways. In the example topology above, the best value for zL2Gateways would likely be gw-a and gw-b (one per line). It's important to use the Zenoss device id(s) for the gateways, and to enter one per line in zL2Gateways. There's no limit to the number of gateways, but more than two probably isn't realistic.

With zL2Gateways set to gw-a and gw-b in the above topology, a complete failure of the data center would result in all events being suppressed except for two events: a ping failure on each of gw-a and gw-b. This is assuming that zL2SuppressIfDeviceDown is enabled. See Non-Ping Event Suppression' below for more information on zL2SuppressIfDeviceDown.

Non-Ping Event Suppression

Suppression of non-ping events can be enabled on a per-device or device class basis by enabling the zL2SuppressIfDeviceDown configuration property. No other configuration or modeling is necessary. Events will only be suppressed for a device with this property enabled when they have a new, acknowledged, or suppressed critical event in the /Status/Ping event class. This suppression is effective at reducing the potential clutter of symptomatic events when a device is no longer reachable on the network either because it has failed, or because the Zenoss collector is no longer able to reach it.

This suppression can be used together with ping event suppression for the most complete reduction of symptomatic event clutter.

Event Suppression Performance

All forms of event suppression as described above have a cost in terms of event processing performance. When zL2SuppressIfDeviceDown is enabled, there is a small additional overhead for processing all events. When zL2SuppressIfPathsDown is enabled and first-hop suppression is performed using either automatic gateway discovery or manual gateway configuration, there is another small overhead for processing ping failure events.

In worst case scenario testing the effective processing rate for non-ping events with the zL2SuppressIfDeviceDown configuration is approximately 80%, 75% for processing ping failure events in the case of a first-hop switch failure, and 70% in the case of a third-hop gateway failure.

All suppression is performed by an event plugin executed within zeneventd processes. Given that zeneventd can be scaled by adding more workers/instances, this additional event processing overhead can be offset by running more zeneventd instances as event processing throughput needs require.

In order to achieve acceptable event processing performance, a variety of caches are used within zeneventd processes. These caches can lead to events not being suppressed in some cases when the configuration, model, or status of devices is coming from stale cache information. The following types of caches are used with different timeouts.

Caches
  • Device status changes can take up to 50 seconds to affect suppression.
  • Configuration changes can take up to 10 minutes to affect suppression.
  • Modeling changes can take up to 55 minutes to affect suppression.

Network Map

Network map
Network map

The network map can be used to see connections between devices. The network map can be found in two places. The first is under Infrastructure -> Network Map where you can manually select the device from which to draw the network map, or from individual devices by clicking on Network Map from the device's left navigation pane. This will present a network map centered on the current device.

Filtering

There are several controls that can be used to filter and otherwise control what you see on the network map. You must click the "Apply" button after adjusting any of these controls to see the resulting network map.

  • Root device or component
  • Maximum hops from root
  • Show MAC addresses
  • Show dangling connections
  • Layers
The network map must start with a node from which connections can be followed. Setting the "Root device or component" is what allows that starting node to be chosen.

The maximum number of hops controls how many hops outward from the root node will be followed. This is the primary mechanism to reduce the size of the resulting network map.

The "Show MAC addresses" option allows more detail to be seen about layer2 connections at the expense of a much busier map. When "Show MAC addresses" is not selected, the map will attempt to consolidate bridge domains into a single cloud node that connects all nodes in the bridge domain. This emulates what you see with layer3 networks. When "Show MAC addresses" isn't selected, individual MAC address nodes used to make connections from node to node will be shown. These MAC addresses can often be clicked to link directly to the network interface associated with the MAC address.

The "Show dangling connections" option allows connector-type nodes such as MAC addresses and IP networks that don't connect other nodes to be displayed. By default these are filtered out to prevent the network map from being cluttered by MAC addresses and IP networks that are only connected to a single device.

The network map will only display a maximum of 1,000 nodes to avoid performance issues both on the Zenoss server, and in the web browser. If you attempt to view a network map with more than 1,000 nodes, a error message will appear to inform you that the map contains too many nodes, and to adjust the filters.

Layers

The network map can be filtered by layers. Layers are tags that Zenoss automatically adds to each link between devices and components. For example, when Zenoss identifies that host is connected to a switch, it will create nodes and links such as the following.

    (host) -> (host MAC address) -> (switch MAC address) -> (switch)

Each of the arrows above represents a link, and in this case each of those links will have the "layer2" tag.

In the same way, if Zenoss identifies that a host is on the same IP network as a router that's its default gateway, it will create nodes and links such as the following.

    (host) -> (192.0.2.0/24) -> (router)

Each of the arrows above represents a link, and in this case each of those links will have the "layer3" tag.

These layers can be used to filter the network map to just the kind of links you're interested in.

The VLAN and VXLAN layers have special handling. If any VLAN or VXLAN layer is selected, the layer2 layer will automatically be included. This is done because you likely wouldn't see the VLAN or VXLAN layer(s) chosen without also following layer2 links.

The selected layers operate as an "OR" filter on the map. Choosing the layer2 and layer3 layers will cause all nodes to be displayed that have at least one of the selected filters. There is currently no support for "AND" filters, or negations.

Colors and Shapes

Different colors and shapes are used on the network map to convey information about the nodes and links on the map.

The fill color of each node's circle depends on the highest severity event currently open on the node. The colors only differ from Zenoss' normal event colors for info, debug, and clear severity events for higher clarity on the map.

Node Colors
  • Critical = Red
  • Error = Orange
  • Warning = Yellow
  • Info = Bright Green
  • Debug = Dark Green
  • Clear = White
The map's current root node will be circled with a purple band.

The links between nodes each have a color and a shape.

Link Color
  • Blue = layer3
  • Green = layer2
  • Yellow = VLAN
  • Gray = Default
Link Shape
  • Circle = Default
  • Diamond = VLAN

Interaction

You can interact with the map using your pointer in a number of ways.

  • Clicking and Dragging
  • Scrolling
  • Left-Clicking
  • Right-Clicking
The map can be panned by clicking and dragging on the map's background. Each node can be moved by clicking and dragging the node. Panning the map won't cause nodes to reorganize, but moving nodes will.

Scrolling, pinching, or mouse-wheeling can all be used to zoom in and out.

Left-clicking on a node will navigate to that node's default page in Zenoss. This only works for nodes that have a page in Zenoss such as devices, components, IP networks, and some MAC addresses. Nothing will happen if a node with no default page is left-clicked.

Right-clicking a node will open its context menu. See below for node context menu details.

Context Menu

Each node on the network map can be right-clicked to open its context menu. Some of the following options may be available depending on the node.

  • Pin Down
  • Put Map Root Here
  • Device Info
  • Open Node in New Tab
The "Pin Down" option freezes the selected node in place on the network map. It will stay wherever you place it, and any unpinned nodes will reorganize around it.

Choosing "Put Map Root Here" is equivalent to changing the "Root device or component" option, but saves typing when you see the node you want to be the center on the map. Some types of nodes such as MAC addresses can't be the root.

The "Device Info" option opens a small pop-up over the network map with more information about the selected node. This option is only available for device and component nodes.

The "Open Node in a New Tab" option will open another tab in your browser to the default Zenoss page for the selected device, component, or IP network. Some types of nodes such as MAC addresses can't be opened in a new tab.

zenmapper daemon

To update catalog with connections for network map, zenmapper daemon is used. It runs every 5 minutes by default, but this option could be changed by passing desired number of seconds to the --cycletime argument.

By default zenmapper configured to start 2 workers. This may be changed in config file by setting "workers" option value. Consider to use more than 2 workers in case you have >1000 devices monitored in Zenoss system. In small or test environment one may disable workers by setting it's value to 0. This affects memory used by zenmapper as well as speed of indexing L2 connections.

zenmapper connects to the ZODB and indexes all the connections provided from providers in ZODB catalog. On 4.2.x RM, running zenmapper on the remote collectors will do nothing because zenmapper runs against the hub. If desired, the additional zenmapper can be disabled by updating /opt/zenoss/etc/daemon.txt on the remote collector.

Writing your own connection provider

Imagine, for example that we want to display on the network map connections of VMware NSX components. They are modeled in NSX ZenPack.

We need to create new class, called for example NSXConnectionsProvider, which inherit from BaseConnectionsProvider, like this:

<syntaxhighlight lang="python">
  1. our provider will inherit from this:
from ZenPacks.zenoss.Layer2.connections_provider import BaseConnectionsProvider
  1. and will yield this:
from ZenPacks.zenoss.Layer2.connections_provider import Connection class NSXConnectionsProvider(BaseConnectionsProvider): def get_connections(self): # self.context is a entity for which we will provide connections for switch in self.context.nsxvirtualSwitchs(): # so, our device is called NSXManager, and it has switches # yield connections to the switches yield Connection(self.context, (switch, ), ('layer3', 'nsx')) # each switch has interfaces: for i in switch.nsxinterfaces(): # yield connection to the interfaces yield Connection(switch, (i, ), ['layer3',]) # and each interface has many to one connection to edges: yield Connection(i, (i.nsxedge(), ), ['layer3',]) </syntaxhighlight>

So, we described how to get connections, now we need to tell zenoss, that this will be connections provider for any NSXManager devices. We do it by registering adapter in our ZenPack's configure.zcml:

<syntaxhighlight lang="xml"> &lt;configure zcml:condition=&quot;installed ZenPacks.zenoss.Layer2.connections_provider&quot;&gt; &amp;lt;adapter factory=&amp;quot;.connections_provider.NSXConnectionsProvider&amp;amp;#10;&amp;quot; for=&amp;quot;ZenPacks.zenoss.NSX.NSXManager.NSXManager&amp;amp;#10;&amp;quot; provides=&amp;quot;ZenPacks.zenoss.Layer2.connections_provider.IConnectionsProvider&amp;amp;#10;&amp;quot;&amp;gt;&amp;lt;/adapter&amp;gt; &lt;/configure&gt; </syntaxhighlight>

Another way to include adapters, is to put them in separate file, called for example layer2.zcml:

<syntaxhighlight lang="xml"> <?xml version = "1.0" encoding = "utf-8"?> &lt;configure xmlns=&quot;http://namespaces.zope.org/zope&amp;#10;&quot; xmlns:zcml=&quot;http://namespaces.zope.org/zcml&amp;#10;&quot;&gt; &amp;lt;adapter factory=&amp;quot;.connections_provider.DeviceConnectionsProvider&amp;amp;#10;&amp;quot; for=&amp;quot;.HyperVVSMS.HyperVVSMS&amp;amp;#10;&amp;quot; provides=&amp;quot;ZenPacks.zenoss.Layer2.connections_provider.IConnectionsProvider&amp;amp;#10; &amp;quot;&amp;gt;&amp;lt;/adapter&amp;gt; &lt;/configure&gt; </syntaxhighlight>

and than include that file conditionally:

<syntaxhighlight lang="xml"> &lt;include file=&quot;layer2.zcml&amp;#10;&quot; xmlns:zcml=&quot;http://namespaces.zope.org/zcml&amp;#10;&quot; zcml:condition=&quot;installed ZenPacks.zenoss.Layer2.connections_provider&quot;&gt;&lt;/include&gt; </syntaxhighlight>

To test connections that your provider yields, you could run

zenmapper run -v10 -d <name></name>

And then look it up on the network map.

Usage

This ZenPack has two separate capabilities. The first is to collect clients connected to switch ports so that event suppression can be done when the switch fails, and the second is to discover neighbor relationships between network devices using the CDP (Cisco Discovery Protocol) and LLDP (Link Layer Discover Protocol).

Collecting Switch Port Clients

To enable discovery of clients connected to switch ports you must enable the zenoss.snmp.ClientMACs modeler plugin for the switch devices. There is no need to enable this plugin for hosts, servers, or other endpoint devices. It is recommended to only assign the modeler plugin to access switch to which monitored servers are connected.

The discovery is done using BRIDGE-MIB forwarding tables, so it's a prerequisite that the switch supports BRIDGE-MIB.

Collecting Network Device Neighbors

To collect neighbor information from network devices that support CDP or LLDP, you must enable the zenoss.snmp.CDPLLDPDiscover modeler plugin for the devices.

This ZenPack has the following requirements.

PythonCollector ZenPack
This ZenPack depends on PythonCollector being installed, and having the associated zenpython collector process running.

Service Impact

When combined with the Zenoss Service Dynamics product, this ZenPack adds built-in service impact capability based on Layer 2 data. The following service impact relationships are automatically added. These will be included in any services that contain one or more of the explicitly mentioned entities.

Service Impact Relationships
  • Device impacted by upstream switch device.

Troubleshooting

Europa

If you are re-installing or updating this ZenPack on Europa, you should first check in control center that zenmapper daemon is stopped, and if not - stop it. It should be stopped automatically, but while this issue is not fixed, you should do that by hand.

Open vSwitch ZenPack

Open vSwitch ZenPack version prior to 1.1.1 should be updated or removed before Layer2 ZenPack installation.

Empty map/links for device

In case index for certain device is broken, one may force zenmapper to reindex this specific device. Daemon should be run with &#45;&#45;force option.

Layer2 forwarding table

Let's discuss Layer2 connections in particular.

The essential mechanism that distinguishes network switches from network hubs is the MAC forwarding table. Instead of broadcasting incoming link layer frames to all it's interfaces, as hubs do, switches look into the forwarding table to find out which particular interface is connected to the destination device. The switch learns which devices are connected to which interface by looking at the source MAC address of incoming frames. Those MAC addresses are called "client MAC addresses".

For zenoss to discover Layer 2 connection between some devices, MAC address of some interface of one device should be equal to client MAC address of some interface of other device. You could check if client MAC addresses for interface are modeled by looking at it's "Clients MAC addresses" display. It there are none, check that zenoss.snmp.ClientMACs modeler plugin is bound to device, and remodel device.

It is also possible that there are no MAC address required to discover connection in forwarding table. To check that, you could run debug utility bridge_snmp.py:

<syntaxhighlight lang="bash"></syntaxhighlight>

    python bridge_snmp.py clientmacs -c &lt;community_string&gt;&lt;/community_string&gt; &lt;host&gt;

&lt;/syntaxhighlight&gt;</host>

and see if your client mac address is visible at switch at all.

Records in forwarding table are aged pretty fast, by default in 5 minutes. So, when there were no network activity on connection for more than 5 minutes, entry will be removed from switch forwarding table. You could check dot1dTpAgingTime object to know exact timeout period in seconds:

<syntaxhighlight lang="bash"></syntaxhighlight> $ snmpget -v2c -c <community_string></community_string> <host> 1.3.6.1.2.1.17.4.2.0 SNMPv2-SMI::mib-2.17.4.2.0 = INTEGER: 300 &lt;/syntaxhighlight&gt;</host>

Impact

This ZenPack also adds impact relation for layer2 connections. Switches impact devices connected to them. But this will work only when such connection is present on network map (see two previous sections for guide on troubleshooting that).

If there is connection on network map, but still, no impact relation, than, probably impact relations were not rebuilt. You could do that by indexing device, for example by changing some field on overview and saving it. Or modeling device again.

Limitations

There are no client MACs data on interfaces modeled for the first time. This happens because zenoss.snmp.ClientMACs plugin runs before interfaces are modeled by another network modeler plugin (for example cisco.snmp.Interfaces or zenoss.snmp.InterfaceMap), so there is no entities to save this attribute on. Currently it is not possible to define order of modeler execution, so this remains a limitation.

Possible workaround is to wait for next model cycle or just model the device again manually.

More information

If you cannot find the answer in the documentation, then Resource Manager (Service Dynamics) users should contact Zenoss Customer Support. Core users can use the #zenoss IRC channel or the community.zenoss.org forums.

Installed Items

Installing this ZenPack will add the following items to your Zenoss system.

Modeler Plugins

  • zenoss.snmp.CDPLLDPDiscover
  • zenoss.snmp.ClientMACs
zProperties
  • zL2Gateways (default: [])
  • zL2PotentialRootCause (default: True)
  • zL2SuppressIfDeviceDown (default: False)
  • zL2SuppressIfPathsDown (default: False)
  • zLocalMacAddresses (default: ["00:00:00:00:00:00"])
  • zZenossGateway (deprecated by zL2Gateways)
Daemons
  • zenmapper

Changes

1.3.0
  • Add "Show MAC addresses" and "Show dangling connectors" to network map.
  • VLAN and VXLAN layers no longer selected by default on network map.
  • Support for multiple gateways per device or device class. (ZEN-24767)
  • Add zL2Gateways property. (ZEN-24767)
  • Deprecate zZenossGateway property. (ZEN-24767)
  • Add zL2PotentialRootCause to allow automatic gateways discovery.
  • Add zL2SuppressIfPathsDown to toggle ping event suppression.
  • Add zL2SuppressIfDeviceDown to toggle non-ping event suppression.
  • Add rootCauses event field for suppressed events.
  • Improve event suppression performance and reliability.
  • Add zLocalMacAddresses to remove unwanted interfaces in maps. (ZEN-23182)
  • Add client discovery support using Q-BRIDGE MIB. (ZEN-25336)
  • Fix "NeighborSwitch" errors after removing the ZenPack. (ZEN-26189)
1.2.2
  • Fix potential 2 minute modeling delay in Zenoss 4.
  • Fix "Connection refused" when Redis not available.
1.2.1
  • Added "workers" option to zenmapper daemon.
  • Refactored connection catalog to use Redis as a storage. This prevent from cases
where ZoDB grows over time (ZEN-22834).
  • Layer2 index now don't touch/modify ZoDB storage in any of cases.
  • Devices added to index in time they changed. Zenmapper daemon adds to index only
differences, e.g. indexing is incremental now.
  • In time when zenpack installed/upgraded zenmapper daemon will create initial index.
This occurs only on first run. And it may take several minutes depending on number of devices.

1.1.1
  • Fix page help code in Layer2 ZP conflict with other ZenPacks (ZEN-21264)
1.1.0
  • When filtering by VLAN show also layer2 links that are VLAN-unaware (ZEN-20946)
  • Add checkbox that allows to show full map
  • Fix Cisco community string indexing in ClientMACs modeler plugin.
  • Fix issue getting client MAC address from labeled VLAN interfaces. (ZEN-19874)
  • Fix Network Map - Missing link from Cisco device to subnet on depth 2,3,4 (ZEN-18603)
  • Make Impact use new connections catalog instead of macs catalog (ZEN-18636)
  • Fix Broken link for Subnet node in Network map (ZEN-20749)
1.0.3
  • Remove macs_catalog when removing the ZenPack. (ZEN-17967)
  • Replace Layer2Info template with ClientMACs modeler plugin.
1.0.2
  • Fix modeling of CDP neighbor switches with IPv6 addresses. (ZEN-17248)
  • Avoid community@VLAN context querying for non-Cisco switches. (ZEN-17258)
  • Change default cycletime for Layer2Info from 30 minutes to 12 hours. (ZEN-17031)
1.0.1
  • Fix device overview links error. (ZEN-14063)
  • Remove add/remove from catalog logging. (ZEN-15465)
  • Fix usage of incorrect community VLAN suffixes on BRIDGE-MIB queries. (ZEN-16951)
  • Fix looping of impact relationships between switches. (ZEN-17020)
  • Fix incorrect modeling of neighbor switches and improve modeling time. (ZEN-17023)
  • Stop binding Layer2Info template to /Network by default. (ZEN-17035)
1.0.0
  • Initial release