iperf UDP Packet Loss Analysis

Purpose

Analyze UDP packet loss of iperf test between 2 VMs with networking type of VPC in the same region

Description

User creates 2 VMs belonging to separate VPCs, which was connected by Express Connect in the same region. Then they use following commands to check packet loss when the bandwidth is more than 50M, and the ratio of packet loss increases with higher bandwidth.

ECS A:  iperf -c <ECS_B_IP> -u -b <bandwidth>

ECS B:  iperf -s -u

Analysis

The topo between ECS A and ECS B:

Data stream topo:

ECS A(192.168.104.235)-> NC1(100.105.59.3)-> vgw(10.141.166.253)-> NC2(100.105.59.9)-> ECS B(10.182.83.13)

Troubleshooting

Firstly, we just think it's a simple networking packet loss issue, so we find a wrong way of packet analysis in which we confirm the communication established between 2 NCs directly without any other networking components like xgw.

Engaged vpc developers for troubleshooting, they identified that the data stream was indeed going through the switch xgw, while the datagram was encapsulated by the IP address of target NC.

[Time ] 17:32:07.130844   Point: input 
[ETHER] 24:4c:07:33:0e:02 -> 00:04:37:28:00:65, eth_type: 0x0800
[IPv4 ] 100.105.59.3 -> 10.141.166.253
proto: 17, ver: 04, ihl: 05, len: 1534, ident: 59824,R: 0, DF: 1, MF: 0, offset: 0, ttl: 60, chksum: 0xfe47
[UDP  ] sport: 46703, dport: 250, size: 1514, chksum: 0x0000
[VxLan] debug_flag: 0, vlan_tag: 0, payload_type: 0, version: 1, tunnel_id: 1878597, tos: 0, tof: 0
[IPv4 ] 192.168.104.235 -> 10.182.83.13
proto: 17, ver: 04, ihl: 05, len: 1498, ident: 55469,R: 0, DF: 1, MF: 0, offset: 0, ttl: 64, chksum: 0xd50e
[UDP  ] sport: 36687, dport: 5001, size: 1478, chksum: 0xa0aa

[Time ] 17:32:07.130854   Point: output
[ETHER] 24:4c:07:33:0e:02 -> 00:04:37:28:00:65, eth_type: 0x0800
[IPv4 ] 100.105.59.3 -> 100.105.59.9
proto: 17, ver: 04, ihl: 05, len: 1534, ident: 59824,R: 0, DF: 1, MF: 0, offset: 0, ttl: 60, chksum: 0x0000
[UDP  ] sport: 46703, dport: 250, size: 1514, chksum: 0x0000
[VxLan] debug_flag: 0, vlan_tag: 0, payload_type: 0, version: 1, tunnel_id: 2125861, tos: 0, tof: 0
[IPv4 ] 192.168.104.235 -> 10.182.83.13
proto: 17, ver: 04, ihl: 05, len: 1498, ident: 55469,R: 0, DF: 1, MF: 0, offset: 0, ttl: 64, chksum: 0xd50e
[UDP  ] sport: 36687, dport: 5001, size: 1478, chksum: 0xa0aa

Included xgw as a part of packet capture, steps as below

ECS A send out UDP packets：iperf -c 10.182.83.13 -u -b 600m
ECS B receive UDP packets：iperf -u -s

Packet capture inside of VMs:

ECS A：

sudo tcpdump -w ~/client.pcap -n -i eth0 src host  192.168.104.25 and src port 1234

ECS B:

sudo tcpdump -w ~/server.pcap -n -i eth0 src host  192.168.104.25 and src port 1234

Packet capture for eth0 of 2 NCs:

NC1: 

sudo houyi-tcpdump -w /apsara/i-6we6pnh19n2q7srkgomd.pcap -nnK -i eth0 udp and src inner_port 1234 and dst inner_host 10.182.83.13

NC2: sudo houyi-tcpdump -B 4096 -w /apsara/i-6we53i9h3ducbju5rmuw.pap -nn -i eth0  udp -K and src inner_host 192.168.104.235 and src inner_port 1234

Packet capture for ASW and LSW

100.105.59.3:46728 ->10.141.166.253:250

As the datagram was encapsulated by the IP address of target NC, the correct format of datagram is as below:

100.105.59.3:46728 -> 100.105.59.9:250

Packet capture statistics

ECS A packet loss/packets sent：171/510203

NC1 eth0 packets sent：510204

ASW/LSW packets：510204

NC2 eth0 packets received：510204

ECS B packets received：510204，capture 507442， dropped by kernel 2162

Per above statistics, we double confirmed that packet loss is inside of VM, known as protocol stack dropout. We modified the UDP buffer size from 212992（208KB）to 2097152（2MB）for target VM.

/proc/sys/net/core/rmem_default 
/proc/sys/net/core/rmem_max

After the modification, no packet loss detected

End

Above issue is one of the scenarios of protocol stack dropout, the traditional way of troubleshooting is to use tcpdump or wireshark for packet capture, here I wanna introduce Dropwatch to diagnose the issue recommended by genius @褚霸, special thanks to my boss @铁竹 who shared following link to me.

https://blog.yufeng.info/archives/2497

My handsome colleague @砺辛 developed a new tool for networking issue diagnose, which includes the scenario of UDP packet loss, it helps identify the root cause quickly when reproducing the issue.

https://www.atatech.org/articles/85295

最后更新：2017-08-24 12:03:23

iperf UDP Packet Loss Analysis

Purpose

Description

Analysis

Troubleshooting

End

上一篇： ECS上自建Redis服务压测报告

下一篇： iperf UDP测试丢包问题分析

相关内容

热门内容

最新内容