iperf UDP Packet Loss Analysis
Purpose
Analyze UDP packet loss of iperf test between 2 VMs with networking type of VPC in the same region
Description
User creates 2 VMs belonging to separate VPCs, which was connected by Express Connect in the same region. Then they use following commands to check packet loss when the bandwidth is more than 50M, and the ratio of packet loss increases with higher bandwidth.
ECS A: iperf -c <ECS_B_IP> -u -b <bandwidth>
ECS B: iperf -s -u
Analysis
The topo between ECS A and ECS B:
Data stream topo:
ECS A(192.168.104.235)-> NC1(100.105.59.3)-> vgw(10.141.166.253)-> NC2(100.105.59.9)-> ECS B(10.182.83.13)
Troubleshooting
- Firstly, we just think it's a simple networking packet loss issue, so we find a wrong way of packet analysis in which we confirm the communication established between 2 NCs directly without any other networking components like xgw.
- Engaged vpc developers for troubleshooting, they identified that the data stream was indeed going through the switch xgw, while the datagram was encapsulated by the IP address of target NC.
[Time ] 17:32:07.130844 Point: input
[ETHER] 24:4c:07:33:0e:02 -> 00:04:37:28:00:65, eth_type: 0x0800
[IPv4 ] 100.105.59.3 -> 10.141.166.253
proto: 17, ver: 04, ihl: 05, len: 1534, ident: 59824,R: 0, DF: 1, MF: 0, offset: 0, ttl: 60, chksum: 0xfe47
[UDP ] sport: 46703, dport: 250, size: 1514, chksum: 0x0000
[VxLan] debug_flag: 0, vlan_tag: 0, payload_type: 0, version: 1, tunnel_id: 1878597, tos: 0, tof: 0
[IPv4 ] 192.168.104.235 -> 10.182.83.13
proto: 17, ver: 04, ihl: 05, len: 1498, ident: 55469,R: 0, DF: 1, MF: 0, offset: 0, ttl: 64, chksum: 0xd50e
[UDP ] sport: 36687, dport: 5001, size: 1478, chksum: 0xa0aa
[Time ] 17:32:07.130854 Point: output
[ETHER] 24:4c:07:33:0e:02 -> 00:04:37:28:00:65, eth_type: 0x0800
[IPv4 ] 100.105.59.3 -> 100.105.59.9
proto: 17, ver: 04, ihl: 05, len: 1534, ident: 59824,R: 0, DF: 1, MF: 0, offset: 0, ttl: 60, chksum: 0x0000
[UDP ] sport: 46703, dport: 250, size: 1514, chksum: 0x0000
[VxLan] debug_flag: 0, vlan_tag: 0, payload_type: 0, version: 1, tunnel_id: 2125861, tos: 0, tof: 0
[IPv4 ] 192.168.104.235 -> 10.182.83.13
proto: 17, ver: 04, ihl: 05, len: 1498, ident: 55469,R: 0, DF: 1, MF: 0, offset: 0, ttl: 64, chksum: 0xd50e
[UDP ] sport: 36687, dport: 5001, size: 1478, chksum: 0xa0aa
- Included xgw as a part of packet capture, steps as below
ECS A send out UDP packets:iperf -c 10.182.83.13 -u -b 600m
ECS B receive UDP packets:iperf -u -s
Packet capture inside of VMs:
ECS A:
sudo tcpdump -w ~/client.pcap -n -i eth0 src host 192.168.104.25 and src port 1234
ECS B:
sudo tcpdump -w ~/server.pcap -n -i eth0 src host 192.168.104.25 and src port 1234
Packet capture for eth0 of 2 NCs:
NC1:
sudo houyi-tcpdump -w /apsara/i-6we6pnh19n2q7srkgomd.pcap -nnK -i eth0 udp and src inner_port 1234 and dst inner_host 10.182.83.13
NC2: sudo houyi-tcpdump -B 4096 -w /apsara/i-6we53i9h3ducbju5rmuw.pap -nn -i eth0 udp -K and src inner_host 192.168.104.235 and src inner_port 1234
Packet capture for ASW and LSW
100.105.59.3:46728 ->10.141.166.253:250
As the datagram was encapsulated by the IP address of target NC, the correct format of datagram is as below:
100.105.59.3:46728 -> 100.105.59.9:250
- Packet capture statistics
ECS A packet loss/packets sent:171/510203
NC1 eth0 packets sent:510204
ASW/LSW packets:510204
NC2 eth0 packets received:510204
ECS B packets received:510204,capture 507442, dropped by kernel 2162
- Per above statistics, we double confirmed that packet loss is inside of VM, known as protocol stack dropout. We modified the UDP buffer size from 212992(208KB)to 2097152(2MB)for target VM.
/proc/sys/net/core/rmem_default
/proc/sys/net/core/rmem_max
- After the modification, no packet loss detected
End
- Above issue is one of the scenarios of protocol stack dropout, the traditional way of troubleshooting is to use tcpdump or wireshark for packet capture, here I wanna introduce Dropwatch to diagnose the issue recommended by genius @褚霸, special thanks to my boss @铁竹 who shared following link to me.
https://blog.yufeng.info/archives/2497
- My handsome colleague @砺辛 developed a new tool for networking issue diagnose, which includes the scenario of UDP packet loss, it helps identify the root cause quickly when reproducing the issue.
https://www.atatech.org/articles/85295
最后更新:2017-08-24 12:03:23