/nap-azure-vmss

NGINX App Protect on Azure VMSS

Primary LanguageShell

NGINX Controller makes Life Cycle Management simple

NGINX Controller offers a simplified Life Cycle Management of your NGINX instances across all of your environment:

  • Auto Scaling
    during Scale In / Scale Out, the instance register / unregister to NGINX Controller
  • Up to date OS, software and Security signatures
    use the native feature reimage or rolling upgrade of your Cloud Service Provider
  • Source Of Truth
    a NGINX instance is bootstrapped from a standard Linux VM image, all of configurations are pushed from NGINX Controller
  • Cloud agnostic
    same principles and onboarding scripts are reusable on any Cloud (Private, Public)
  • Native Scaling Policy
    InfraOps are free to use the scaling policy offered by their Cloud Service Provider (CSP).

This repo provides a demo of a scaling group implementation using NGINX App Protect instances managed by NGINX Controller.

Last chapter warns you to take in consideration resiliency and DNS LB in order to enhance User Experience.


Demo done on Azure using a VM Scale Set.

VMSS + NGINX Controller | Scale Out VMSS + NGINX Controller | Scale In VMSS + NGINX Controller | Reimage

When an new VM instance is bootstraped, this instance is up to date:

  • OS packages
  • Software: NGINX Plus, NGINX App Protect and NGINX Controller agent
  • Security signatures

Then this instance retrieve his configuration from NGINX Controller. Therefore NGINX Controller is the Source of Truth for all of your managed NGINX instances.

In order to upgrade your cluster, the immutable approach is recommended: destroy & recreate your VM instances. As done on Kubernetes, the immutable approach is more simpler and safer than the mutable approach that will apply changes.

How to do that on Azure VM Scale Set?

If you select reimage, it will remove the VMSS instance and replace it with a brand new one. This can be used if you are having issues or need to upgrade per instance, that will delete it and redeploy it up to date.

Upgrading will apply any changes that were applied to the scaleset as a whole. So for example, if you apply a custom script extension to VMSS1 you need to update the VMSS instances in order for that custom script to actually be applied.

During a Scale In or reimage operation, an impact on User Experience exists if:

  • persistency is set on the downstream Load Balancer
  • no Global Load Balancing exists across regions or multi-cloud.

A user or a consumer have no access to the service during few seconds with no notification

A Web Browser opens up to 15 TCP sessions to a remote Domain service and keeps it them alive in order to re-use then to send further HTTP transactions. When a Scale In or reimage operation occurs, NGINX process received a SIG_TERM signal and all of NGINX workers are shutdown gracefully: current HTTP transactions are drained and then TCP sessions are closed.

The picture below shows a reimage operation that occurs at second #7.

NGINX drains transactions

However, the External Azure Load Balancer is configured with:

  • a Persistency
  • a health probe interval of 5s
  • a unhealthy threshold of 2

In that case, further TCP sessions initiated from the browser will be stuck up to 15s to the same VM instance... that is unavailable.

ALB persists

After 15s, External Azure Load Balancer chose another pool member. Then the service is up again for this user.

Source: full PCAP capture with no DNS LB here

Do not persist? No, it's not a solution, persistence is useful for :

  • Web Application Firewall security features that track user sessions (CSRF, DeviceID, JS injection, cookie...)
  • troubleshooting purpose

DNS Load Balancing

Use a DNS LB record to Load-Balance traffic across 2 regions or multi-cloud. 2 Public IPs are returned for your DNS domain. If a TCP session on one Public IP returns a RST, Web Browser will switch automatically to the other Public IP after 1s. Acceptable impact for a good User Experience, well done! :o)

  • F5 Cloud Services DNS LB hosts webhook.f5cloudbuilder.dev A record that returns 2 Public IPs:

    52.167.72.28 ; 52.179.176.230

DNS LB record

  • Web Browser resolves webhook.f5cloudbuilder.dev and keeps it in DNS Cache
    Because DNS Cache TTL in Chrome is 60s minimum (here), TTL of A record must be set at least to 60s
  • Web Browser makes TCP connexions only to IP 52.167.72.28

Connexion to IP 1

  • At second #13 in the picture bellow, NGINX VM is shutdown and NGINX closes all TCP connexions gracefully
  • At second #17 Web Browser opens new TCP connexions to the same Public IP 52.167.72.28 and Azure Load Balancer responds with TCP RST due to persistence linked to a NGINX VM shutdown
  • At second #18 Web Browser opens new TCP connexions to the second Public IP 52.179.176.230

Connexion to IP 2

Source: full PCAP capture with DNS LB here

  • Components:
    • A NGINX Controller hosted in a "Cross Management" / "Shared service" / "Out of Band" zone
    • A VM Scale Set of NGINX App Protect instances
    • An External Azure Load Balancer to publish a Public IP

Architecture

Once a VM is started, VM is onboarded using an Extension: Shell or Cloud Init commands that must includes to run the 2 scripts below. Example of Extension here in Jinja2 format for Ansible

install_managed_nap.sh install and run:

  • NGINX+: Application Load-Balancer
  • App Protect module: Web Application and API Protection
  • last WAF signature update
  • NGINX Controller agent: register VM instance and pull configuration

Input variables:

Variable Description
EXTRA_NGINX_CONTROLLER_IP NGINX Controller IP
EXTRA_NGINX_CONTROLLER_USERNAME NGINX Controller user account
EXTRA_NGINX_CONTROLLER_PASSWORD NGINX Controller user password
EXTRA_NGINX_PLUS_VERSION NGINX+ version to install
EXTRA_LOCATION Location name, same as created on NGINX Controller
EXTRA_VMSS_NAME VM Scale Set name, same as the Instance Group created on NGINX Controller

scale_in_monitor.sh monitors a Scale In or a reimage event. When a Scale In or a reimage event occurs, this script is responsible to unregister this instance from NGINX Controller

Input variables:

Variable Description
ENV_CONTROLLER_API_URL NGINX Controller IP and port
ENV_CONTROLLER_USERNAME NGINX Controller user account with less privilege on Instance Group
ENV_CONTROLLER_PASSWORD NGINX Controller user password
ENV_CONTROLLER_LOCATION Location name, same as created on NGINX Controller
ENV_CONTROLLER_INSTANCE_NAME Hostname of the NGINX instance