Friday, September 6, 2013

It's your problem..No! it's yours!

What do you do when your firewall is receiving TCP resets from the customer yet the customer insists that they are not generating the packets from their side?

This is one that has been the focus of alot of attention this week. We could see that our VPN tunnel was being actively terminated from the customer side yet the customer stated that they weren't seeing any issues and there were no log entries on their side to suggest a problem.

In cases like these it's easy to assume that the customer is a) hiding something or b) an idiot. Thankfully after alot of consideration we settled on c) we were both right.

The cause of this issue was revealed when the customer supplied a trace route from their VPN firewall to our router on the edge of the network. Working systematically we simply ran a ping to each hop back to the customer and observed the results. The connection between us and our ISP was fine, no packet loss, and this confirmed to us that our side was good. We kept  pinging each hop in turn until Bingo! Packet Loss!

A quick search across the internet registrar websites (RIPE, AfriNIC, LACNIC, ARIN, APNIC) showed that the offending network belonged to the customers ISP. The peer between the ISP and the next hop to be precise. Once this had been established it was then easy to establish that the ISP had a case already open to investigate the peer issues and the activity we had been observing was part of it.

The result was that because packet loss was occurring, our side of the VPN kept resending tcp packets waiting for SYN ACK's that were never going to arrive (some did). The customer would get some of the resent TCP packets but because from their point of view the tunnel was up their firewall was ignoring the SYNs. Our firewall would then try again. The buffers on the customer firewall would fill up and aTCP reset would be generated because of this (note that the TCP reset was not related to the customer's active VPN Tunnel - hence they were stating that the TCP resets we were seeing were not being generated from their side of the VPN) and the process would start over again.

The morale of the story. This tale may well seem a tad dull but I have a point (really). It's about listening, it's easy to dismiss the other party as failing to listen, incompetent, inexperienced, or what have you and equally it's easy to assume that the issue is not on your network but with out thorough investigation and methodically investigating the path of the network, issues can be easily missed and assumptions be made. This in turn leads to misunderstanding and delay.

Listening. Another tool in the network engineer's toolkit.

Lesson learned...

No comments:

Post a Comment