Azure Internal Load Balancer (ILB) hairpin

1. Introduction


As per Azure documentation - https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-overview#limitations – the Azure Internal Load Balancer default behaviour is as follows 
..if an outbound flow from a VM in the backend pool attempts a flow to frontend of the internal Load Balancer in which pool it resides and is mapped back to itself, both legs of the flow don't match and the flow will fail.
So, what happens if your application design requires backend pool members to make calls to the private frontend of the same load balancers they are associated with?

ILB hairpin - single backend
In the above example, if VM-WE-02-Web01 initiates a connection to 10.2.1.100:80 (ILB VIP) there is a 100% chance this connection will fail. If the backend pool happened to contain other VMs (E.g. backend pool with 2 instances) then there is a chance (50/50) the frontend request would get mapped, successfully, to another backend member. As shown below:

ILB hairpin - multiple backend


1.1 Why is this not working?

From the same documentation link 
When the flow maps back to itself the outbound flow appears to originate from the VM to the frontend and the corresponding inbound flow appears to originate from the VM to itself.
Let's take a look at the default NAT behaviour of the ILB to understand the problem in more detail.
  • The Azure ILB does not perform inbound Source-NAT (SNAT) and therefore the original source IP is preserved. 
  • When using the default LB rule setting of DSR (aka floating IP) disabled, we do perform Destination-NAT (DNAT)
ILB harpin - NAT xlate
All of the above results in the following, again from the original documentation link:
From the guest OS's point of view, the inbound and outbound parts of the same flow don't match inside the virtual machine. The TCP stack will not recognize these halves of the same flow as being part of the same flow as the source and destination don't match
We can confirm this behaviour using WireShark. Firstly, for a flow that does work, showing a successful 3-way TCP handshake. (FYI this flow is sourced from the on-premise location, see topology diagram in the next section)
Wireshark - working flow

Now for a flow that does not work, showing a failure of the TCP handshake. We do not get past the SYN stage. As the Azure ILB performs DNAT (see frame number 7647 on the screenshot below for confirmation of this) on the return traffic, the O/S is unable to reconcile the flow and we therefore fail to observe a TCP SYN ACK.
Wireshark - non working flow

2. Lab Topology

Now that we have detailed the behaviour, lets look at possible workarounds. To do this I will use the following lab environment. 
  • Azure spoke Virtual Network (VNet) containing Windows 2016 Server VM running IIS, hosting simple web page
  • Azure spoke VNet containing Azure ILB
  • Azure hub VNet containing ExpressRoute Gateway
  • VNet peering between Hub and Spoke VNets
  • InterCloud ExpressRoute circuit providing connectivity to On-Premises
  • On-Premise DC with test client Virtual Machine

2.1 Baseline

From the client VM (192.168.2.1) we are able to successfully load the web page via the ILB front end.
However, from the backend VM (10.2.1.4) we are only able to load the web page using the local VM IP address. Access via the frontend ILB VIP fails, due to the condition described in section 1.

// show single NIC
c:\pstools>ipconfig | findstr /i "ipv4"
   IPv4 Address. . . . . . . . . . . : 10.2.1.4

// show working connectivity using localhost address
c:\pstools>psping -n 3 -i 0 -q 10.2.1.4:80

PsPing v2.10 - PsPing - ping, latency, bandwidth measurement utility
Copyright (C) 2012-2016 Mark Russinovich
Sysinternals - www.sysinternals.com

TCP connect to 10.2.1.4:80:
4 iterations (warmup 1) ping test: 100%

TCP connect statistics for 10.2.1.4:80:
  Sent = 3, Received = 3, Lost = 0 (0% loss),
  Minimum = 0.09ms, Maximum = 0.13ms, Average = 0.11ms

// show baseline failure condition to front end of LB
c:\pstools>psping -n 3 -i 0 -q 10.2.1.100:80

PsPing v2.10 - PsPing - ping, latency, bandwidth measurement utility
Copyright (C) 2012-2016 Mark Russinovich
Sysinternals - www.sysinternals.com

TCP connect to 10.2.1.100:80:
4 iterations (warmup 1) ping test: 100%

TCP connect statistics for 10.2.1.100:80:
  Sent = 3, Received = 0, Lost = 3 (100% loss),
  Minimum = 0.00ms, Maximum = 0.00ms, Average = 0.00ms


3. Workarounds

3.1. Workaround Option [1] - Second NIC

  • Add a second NIC to the Virtual Machine (from within the Azure VM config) with different IP address (we use .5 in the diagram above)
  • Configure local (O/S level) static route forcing traffic destined to the LB VIP out of the secondary NIC
This works as the packet from backend-to-frontend now has a different source (10.2.1.5) and destination IP address (10.2.1.100 > DNAT > 10.2.1.4). With verification as per below:


//command line from web server

// show multiple NIC
c:\pstools>ipconfig | findstr /i "ipv4"
   IPv4 Address. . . . . . . . . . . : 10.2.1.4
   IPv4 Address. . . . . . . . . . . : 10.2.1.5

// show baseline failure condition
c:\pstools>psping -n 3 -i 0 -q 10.2.1.100:80

PsPing v2.10 - PsPing - ping, latency, bandwidth measurement utility
Copyright (C) 2012-2016 Mark Russinovich
Sysinternals - www.sysinternals.com

TCP connect to 10.2.1.100:80:
4 iterations (warmup 1) ping test: 100%

TCP connect statistics for 10.2.1.100:80:
  Sent = 3, Received = 0, Lost = 3 (100% loss),
  Minimum = 0.00ms, Maximum = 0.00ms, Average = 0.00ms

// static route traffic destined to LB front end out of second NIC
c:\pstools>route add 10.2.1.100 mask 255.255.255.255 10.2.1.1 if 18
 OK!

// show working connectivity to LB front end
c:\pstools>psping -n 3 -i 0 -q 10.2.1.100:80

PsPing v2.10 - PsPing - ping, latency, bandwidth measurement utility
Copyright (C) 2012-2016 Mark Russinovich
Sysinternals - www.sysinternals.com

TCP connect to 10.2.1.100:80:
4 iterations (warmup 1) ping test: 100%

TCP connect statistics for 10.2.1.100:80:
  Sent = 3, Received = 3, Lost = 0 (0% loss),
  Minimum = 0.68ms, Maximum = 1.45ms, Average = 0.99ms

3.2 Workaround Option [2] - Loopback VIP (+ DSR)

  • Configure loopback interface on backend VM with same IP address as ILB VIP (10.2.1.100)

  • Configure backend VM applicaton (IIS in our case) to listen on additional IP address. NB. If using Windows Server 2016, enable weakhostsend on both the NIC and Loopback interface. See RFC1122.


Connectivity is now working externally from the on-premise VM using ILB VIP 10.2.1.100, with DSR enabled.
c:\pstools>ipconfig | findstr /i "ipv4"
   IPv4 Address. . . . . . . . . . . : 192.168.2.1

c:\pstools>psping -n 3 -i 0 -q 10.2.1.100:80

PsPing v2.10 - PsPing - ping, latency, bandwidth measurement utility
Copyright (C) 2012-2016 Mark Russinovich
Sysinternals - www.sysinternals.com

TCP connect to 10.2.1.100:80:
4 iterations (warmup 1) ping test: 100%

TCP connect statistics for 10.2.1.100:80:
  Sent = 3, Received = 3, Lost = 0 (0% loss),
  Minimum = 19.39ms, Maximum = 20.17ms, Average = 19.78ms

Connectivity from the web server itself is also working when accessing the service on 10.2.1.100, as this exists locally on the server, aka on-link.
c:\pstools>ipconfig | findstr /i "ipv4"
   IPv4 Address. . . . . . . . . . . : 10.2.1.4
   IPv4 Address. . . . . . . . . . . : 10.2.1.100

c:\pstools>route print 10.2.1.100
Active Routes:
Network Destination        Netmask          Gateway       Interface  Metric
       10.2.1.100  255.255.255.255         On-link        10.2.1.100    510

c:\pstools>psping -n 3 -i 0 -q 10.2.1.100:80

PsPing v2.10 - PsPing - ping, latency, bandwidth measurement utility
Copyright (C) 2012-2016 Mark Russinovich
Sysinternals - www.sysinternals.com

TCP connect to 10.2.1.100:80:
4 iterations (warmup 1) ping test: 100%

TCP connect statistics for 10.2.1.100:80:
  Sent = 3, Received = 3, Lost = 0 (0% loss),
  Minimum = 0.11ms, Maximum = 0.20ms, Average = 0.14ms


There are three key call outs here:
  1. The backend call to the frontend VIP never leaves the backend VM. This may or may not suit your application requirements, the request can only be served locally.
  2. DSR is optional, but allows the backend VM to listen on a common IP (The ILB VIP) for all connecitons, locally originated and remote. 
  3. You must continue to listen on the physical primary NIC IP address for application connections, otherwise LB health probes will fail

3.3 Workaround Option [3] - Application Gateway / NVA

A simple option for HTTP/S traffic is to utilise Azure Application Gateway instead. Note, use of either APGW or an NVA has cost, performance and scale limitations as these are fundamentally different products. These solutions are based on additional compute resources that sit inline with the datapath, where as the Azure Load Balancer can be thought of more as a function of the Azure SDN.

Application Gateway
Application Gateway only supports HTTP/S frontend listeners, therefore, if a LB solution for other TCP/UDP ports is required an NVA (Network Virtual Appliance) is required. NGINX is one of these 3rd party NVA options.

NGINX
See https://github.com/jgmitter/nginx-vmss for fast start NGINX config on Azure including ARM template. Also see https://github.com/microsoft/PL-DNS-Proxy for a similar NGINX template with ability to deploy to custom VNet.

Two NGINX instances are used for high availability. Each instance contains the same proxy/LB configuration. These instances are fronted by an Azure internal load balancer themselves to provide a single front end IP address for client access. This front end IP also works for backend members specified in the NGINX config file, as shown on the diagram above.

The simple NGINX proxy configuration is shown below.
upstream backend {
      server 10.2.1.4;
   }

   server {
      listen 80;

      location / {
          proxy_pass http://backend;
      }
   }

The number of backend servers in my example is a single VM, in production there would be multiple nodes and additional lines within the upstream module. E.g.
upstream backend {
      server 10.2.1.4;
      server 10.2.1.5;
      server 10.2.1.6;
   }

   server {
      listen 80;

      location / {
          proxy_pass http://backend;
      }
   }




Comments

  1. Adam, this was a fantastic explanation! thanks!!!

    ReplyDelete
  2. Amazing! Thanks for sharing it with us!

    ReplyDelete
  3. hi
    very good article

    i 'm stuck with my implementation...is there any way you can help me out ?

    i am running out of time

    thanks

    ReplyDelete
  4. forgot to add what i have

    1 Azure ILB with 2 SQL databases in the backend pool.

    the databases VM has 2 NIC each
    One NIC is prymary and the other NIC has just an static IP (without default gateway of course)

    my intention is to point the LB to the secondary NIC and use DSR, but if i do that the VM servers can not reply back to the probe with the same NIC, they are trying to reply using the primary NIC then the LB put the VMs out of the pool because it does not receive the reply from NIC 2 as expected.

    if i add a route in the DB to reply to the probe coming from 168.63.129.16 using the NIC2 it will work but them other services from Azure will stop working as they contact NIC1 (the primary) and the replies are going through NIC 2

    is there a way we can commnunicate to explain more in details ?

    ReplyDelete
  5. Thank you for the informative blog regarding azure cloud migration services. I found very useful data from this blog.
    azure cloud migration services

    ReplyDelete
  6. I just want to thank you for sharing your information and your site or blog this is simple but nice Information I’ve ever seen i like it i learn something today.  Migrating Application Workloads to Azure WS-050

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. Devops Proxy Support, Aws Interview Support, Azure Interview Support, Devops Interview Support, Devops Proxy Support, Aws Proxy Support, Devops Proxy Interview Support Provides DevOps Online Job Support From India and AWS DevOps, Azure DevOps Proxy Interview Support with experts. Contact Now! 7339826411

    ReplyDelete
  10. We are come up with the most senior consultants from India for this Devops on job. Devops Technical Support, Devops Proxy Support and Devops On Job Support. Aws Interview Support, Azure Interview Support, Azure Proxy Support, Devops Interview Support, Devops Proxy Support, Aws Proxy Support, Devops Proxy Interview Support, AWS DevOps, Azure DevOps Proxy Interview Support. Contact Now! 7339826411

    ReplyDelete
  11. We are come up with the most senior consultants from India for this Devops on job. Devops Technical Support, Devops Proxy Support and Devops On Job Support. Aws Interview Support, Azure Proxy Support, Azure Interview Support , Devops Interview Support, Devops Proxy Support, Aws Proxy Support, Devops Proxy Interview Support, AWS DevOps, Azure DevOps Proxy Interview Support. Contact Now! 7339826411

    ReplyDelete
  12. Thanks for this wonderful blog, Keep sharing your thoughts like this...
    Azure Training in Chennai
    Microsoft Azure Online Training

    ReplyDelete
  13. Great content. But it's quite strange, this solution works perfectly in Windows but not Ubuntu. Anyone knows why? Thanks in advance.

    ReplyDelete
  14. At APTRON Solutions, we understand the importance of hands-on learning and practical experience. That's why our Microsoft Azure Fundamentals Training in Noida is structured to provide you with a blend of theoretical knowledge and real-world applications. Our industry-expert trainers will guide you through the core concepts of Microsoft Azure, including cloud concepts, Azure services, Azure pricing, and support.

    ReplyDelete

Post a Comment

Popular posts from this blog

On-Premise access to Azure Storage over Private Connectivity

Azure Intra-Region and Inter-Region VNET Routing