Alta3 Research | Advanced Tech Training

Diagnosing network problems is something that system administrators dread. Almost always their inability to perform efficient network analysis is based on their own reluctance to learn and embrace network testing. Let’s change that by learning three tools that a system admin should routinely use: mtr, iperf, and MTU testing with ping.

MTR Testing

Routers can be potential bottlenecks and the first thing you need to determine is how many of them are between your nodes. This step will determine how many routers are between your nodes and if any routers are adding unacceptable latency (lag).

Step 1: Select two physical servers to use as test points in your network. In our environment, these are servers that are hosting virtual machines. If you start your testing from the virtual machine endpoints, you are skipping an important troubleshooting step. Stick with physical servers for this first round of testing so you can get the ground truth of the underlying networking. Begin by running the mtr command to determine how complex your network diagnostic task is going to be by determining the “hop count” between your two test points.

mtr stands for “My TraceRoute” which runs a continuous traceroute and presents results in a formatted window. The amount of lines indicates the number of routers (hops) as follows:

One line = no routers
Two lines = 1 router
Three lines = 2 routers
And so on and so forth

+-------------------+                           +--------------------+
|                   |       +-------------+     |                    |
|                   +       |             |     +                    |
| node1             +-------+  Router(s)  +-----+            node2   |
|                   +       |             |     +                    |
|                   |       +-------------+     |                    |
+-------------------+                           +--------------------+
                                 How Many Routers?  

mtr node2 -c 5 --report
Start: 2020-03-28T14:22:36-0400
HOST: node1                     Loss%   Snt   Last   Avg  Best  Wrst StDev
1.|-- node2                    0.0%     5    0.2   0.2   0.1   0.2   0.0

In this example, there is only one hop, so there are zero routers. Record the number you see and move on. Keep in mind that each router is a potential MTU speed bump, so let mtr run without the count option (-c) for a few minutes and watch the live display. mtr will tell you which routers are offending. If a router within your network is adding more than a millisecond, you will need to fix it.

For illustrative purposes, run mtr cisco.com Check out the "Last" column. The lower the number is, the better. This is measuring the round trip time from the Alta3 Research cloud to each router. This shows acceptable Verizon edge connectivity to Alta3 Research. Note the latency at step 6. Since 42 milleseconds from Hershey to Dallas and back again is acceptable, let's move on.

My traceroute  [v0.92]
node1 (a.b.c.d)                                                                                2020-03-29T20:15:20+0000
Keys:  Help   Display mode   Restart statistics   Order of fields   quit
                                                            Packets               Pings
Host                                                        Loss%   Snt   Last   Avg  Best  Wrst StDev
1. alta3router.localdomain                                  0.0%     5    0.3   0.5   0.3   1.0   0.3  300 micro seconds is fine. What else would you expect from Alta3?
2. lo0-100.HRBGPA-VFTTP-305.verizon-gni.net                 0.0%     5    3.1   9.5   0.9  16.0   7.1
3. ae1305-21.PHLAPALO-MSE01-AA-IE1.verizon-gni.net          0.0%     5    4.9   5.4   4.2   6.2   0.8
4. 0.ae2.BR1.PHIL.ALTER.NET                                 0.0%     5    6.0   4.7   3.6   6.0   1.0
5. ae-14.bar4.Philadelphia1.Level3.net                      0.0%     5    7.4   6.9   3.7  12.4   3.4
6. ae-3-5.edge5.Dallas3.Level3.net                          0.0%     5   39.4  39.7  38.5  41.6   1.4 <-- look at that. 32 ms from Philly to Dallas.
7. CISCO-SYSTE.edge5.Dallas3.Level3.net                     0.0%     5   41.2  40.9  39.7  43.7   1.7
8. rcdn9-cd1-cbb-gw1-ten0-0-0-12.cisco.com                  0.0%     5   42.6  41.5  40.3  42.6   1.1
9. 72.163.0.98                                              0.0%     5   41.9  41.3  39.9  42.2   1.0
10. rcdn9-cd1-dmzdcc-gw1-por1.cisco.com                      0.0%     5   40.5  40.4  39.8  41.1   0.5
11. rcdn9-14b-dcz05n-gw1-por1.cisco.com                      0.0%     5   39.9  41.9  39.6  48.5   3.7
12. redirect-ns.cisco.com                                    0.0%     5   42.5  41.7  40.3  43.4   1.4

IPERF Testing

Step 2: Use iperf (IP Performance) to test actual end to end (E2E) performance. Iperf must be run as a client on one end, and as a server on the other end like this:

+-------------------+                           +--------------------+
|                   |       +-------------+     |                    |
|            +------+       |             |     +-----+              |
| node1      |iperf +-------+  Router(s)  +-----+iperf|     node2    |
|            +------+       |             |     +-----+              |
|                   |       +-------------+     |                    |
+-------------------+                           +--------------------+
              Client                             Server


Install iperf on both nodes: sudo apt install iperf

node2> iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  4] local 10.13.116.171 port 5001 connected with 10.0.0.56 port 37022
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  2.15 GBytes  1.85 Gbits/sec


node1> iperf -c node2
------------------------------------------------------------
Client connecting to node2, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.56 port 37022 connected with 10.13.116.171 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  2.15 GBytes  1.85 Gbits/sec

In this example, the bit rate is 2.15 Gbit per second. In a 10 Gps cloud network with a hop count of 1, you are expecting performance like 9.8 Gbits/sec. If you see a number less than you expect, such as this example shows, you have work to do.

CHECK MTU

There is an old saying in our company, “it's always DNS or MTU”. Checking for MTU is so easy that it is worth the effort to understand and perform this quick test. MTU stands for Maximum Transmission Unit. You can understand MTU with the following analogy.

You are loading 18-wheeler trucks, driving from Pittsburgh, PA to Hershey, PA via the Pennsylvania Turnpike. This stretch of road has several tunnels through the Appalachian mountain range. If a truck with cargo is loaded higher than a tunnel entrance, a collision will occur. On the other hand, if you dispatch the trucks with cargo height LOWER than the tunnels, there will be no height collision. In this analogy, the tunnel is the ethernet connection, the mountain is the ethernet switch, and the height of the tunnel is the switch’s MTU setting. Unfortunately, this analogy is woefully poor, because any reasonable person would think that a collision at the entrance of a tunnel would result in reports back to the dispatcher to fragment payloads to fit the MTU. Such is NEVER the case with ethernet switches, only IP interfaces. Switches will trash packets and NEVER report anything back to source. Properly setting the MTU to match the layer 2 network is critical.

To detect MTU problems the sys admin (that’s you) must test for this condition. As the sys admin, you must own MTU problem diagnosis.

Test the MTU by forcing ping to not fragment the payload by using the -M do flag. This flag forces ICMP to send ping packets the exact size you specify by setting the DF (don’t fragment) bit in the IP header.

do = sets the DF bit (prohibit IP fragmentation). ← we want this one!
want = perform MTU discovery and fragment (encourage fragmentation)
dont = do not set DF (don’t fragment) bit (require fragmentation)

Jumbo frames are normally 9k (9000) bytes per frame, but icmp will add 28 bytes to the overall packet size. So testing the link to see if a 9000 byte packet will fit involves subtracting by 28 to determine the packet size:

Check 9000 byte jumbo frame performance: 9000 - 28 = 8972, therefore:

ping -M do -s 8972 node2  
PING node2 (10.13.116.171) 8972(9000) bytes of data.  
ping: local error: Message too long, mtu=1500

Message too long - This error means one or both of the server interface’s MTU is set at 1500 bytes and you are forcing an MTU larger than that which is failing. If you are not using jumbo frames, this is normal.

ping -M do -s 8972 node2
PING node2 (10.13.116.171) 8972(9000) bytes of data.
(nothing) really bad!

Nothing at all - If you don’t see any response, then the node interface MTU is set greater than the ethernet switch interface. Either set the ethernet switch to jumbo frames, or the IP interface to match the ethernet switch MTU size. This is REALLY BAD. Fix this ASAP.

ping -M do -s 8720 node2
PING node2 (10.0.0.12) 8720(8748) bytes of data.
8728 bytes from node2 (10.0.0.12): icmp_seq=1 ttl=64 time=0.182 ms

xxxx bytes from node2 - Good! This is what you are looking for!

Traditional ethernet frames are 1500 bytes per frame. This frame size is typical for everywhere except data centers. MTU issues will arise here when you are tunneling, so MTU testing is important even for traditional ethernet networking..

Check 1500 byte frame performance: 1500 - 28 = 1472, therefore:

ping -M do -s 1472 node2
Ping node2 (10.13.116.171) 1472(1500) bytes of data.
1480 bytes from node2 (10.13.116.171): icmp_seq=1 ttl=64 time=0.376 ms

xxxx bytes from node2 - Good! This is what you are looking for on traditional ethernet networks!

Congratulations, you are now half way! If you are running a cloud network, test between virtual machines next.

Check the Virtual Networking

+-------------------+                           +--------------------+
|node1              |       +-------------+     |               node2|
|    +--+    +---+  |       |             |     |   +---+    +--+    |
|    |VM+----+BR0+----------+  Router(s)  +---------+BR0+----+VM|    |
|    +--+    +---+  |       |             |     |   +---+    +--+    |
|                   |       +-------------+     |                    |
+-------------------+                           +--------------------+

Now repeat the above steps from virtual machine to virtual machine.

If you are like us, your production services are running as virtual machines, connected by virtual networking function to top of rack switches. In the above example, “BR0” represents either a linux bridge or openvSwitch layer2 switch function. You must repeat the mtr, iperf, and MTU ping tests from VM to VM. Unless you are running Intel DPDK Data Plane Development Kit acceleration, you can expect network speeds to be about 4x slower than the performance you saw on the nodes themselves.