阅读785 返回首页    go 阿里云 go 技术社区[云栖]


iperf UDP Packet Loss Analysis

Purpose

Analyze UDP packet loss of iperf test between 2 VMs with networking type of VPC in the same region

Description

User creates 2 VMs belonging to separate VPCs, which was connected by Express Connect in the same region. Then they use following commands to check packet loss when the bandwidth is more than 50M, and the ratio of packet loss increases with higher bandwidth.

ECS A:  iperf -c <ECS_B_IP> -u -b <bandwidth>

ECS B:  iperf -s -u

iperf2

Analysis

The topo between ECS A and ECS B:

image

Data stream topo:

ECS A(192.168.104.235)-> NC1(100.105.59.3)-> vgw(10.141.166.253)-> NC2(100.105.59.9)-> ECS B(10.182.83.13)

Troubleshooting

  • Firstly, we just think it's a simple networking packet loss issue, so we find a wrong way of packet analysis in which we confirm the communication established between 2 NCs directly without any other networking components like xgw.

image

  • Engaged vpc developers for troubleshooting, they identified that the data stream was indeed going through the switch xgw, while the datagram was encapsulated by the IP address of target NC.
[Time ] 17:32:07.130844   Point: input 
[ETHER] 24:4c:07:33:0e:02 -> 00:04:37:28:00:65, eth_type: 0x0800
[IPv4 ] 100.105.59.3 -> 10.141.166.253
proto: 17, ver: 04, ihl: 05, len: 1534, ident: 59824,R: 0, DF: 1, MF: 0, offset: 0, ttl: 60, chksum: 0xfe47
[UDP  ] sport: 46703, dport: 250, size: 1514, chksum: 0x0000
[VxLan] debug_flag: 0, vlan_tag: 0, payload_type: 0, version: 1, tunnel_id: 1878597, tos: 0, tof: 0
[IPv4 ] 192.168.104.235 -> 10.182.83.13
proto: 17, ver: 04, ihl: 05, len: 1498, ident: 55469,R: 0, DF: 1, MF: 0, offset: 0, ttl: 64, chksum: 0xd50e
[UDP  ] sport: 36687, dport: 5001, size: 1478, chksum: 0xa0aa
[Time ] 17:32:07.130854   Point: output
[ETHER] 24:4c:07:33:0e:02 -> 00:04:37:28:00:65, eth_type: 0x0800
[IPv4 ] 100.105.59.3 -> 100.105.59.9
proto: 17, ver: 04, ihl: 05, len: 1534, ident: 59824,R: 0, DF: 1, MF: 0, offset: 0, ttl: 60, chksum: 0x0000
[UDP  ] sport: 46703, dport: 250, size: 1514, chksum: 0x0000
[VxLan] debug_flag: 0, vlan_tag: 0, payload_type: 0, version: 1, tunnel_id: 2125861, tos: 0, tof: 0
[IPv4 ] 192.168.104.235 -> 10.182.83.13
proto: 17, ver: 04, ihl: 05, len: 1498, ident: 55469,R: 0, DF: 1, MF: 0, offset: 0, ttl: 64, chksum: 0xd50e
[UDP  ] sport: 36687, dport: 5001, size: 1478, chksum: 0xa0aa

  • Included xgw as a part of packet capture, steps as below

ECS A send out UDP packets:iperf -c 10.182.83.13 -u -b 600m
ECS B receive UDP packets:iperf -u -s

Packet capture inside of VMs:


ECS A:

sudo tcpdump -w ~/client.pcap -n -i eth0 src host  192.168.104.25 and src port 1234
ECS B:

sudo tcpdump -w ~/server.pcap -n -i eth0 src host  192.168.104.25 and src port 1234

Packet capture for eth0 of 2 NCs:


NC1: 

sudo houyi-tcpdump -w /apsara/i-6we6pnh19n2q7srkgomd.pcap -nnK -i eth0 udp and src inner_port 1234 and dst inner_host 10.182.83.13

NC2: sudo houyi-tcpdump -B 4096 -w /apsara/i-6we53i9h3ducbju5rmuw.pap -nn -i eth0  udp -K and src inner_host 192.168.104.235 and src inner_port 1234



Packet capture for ASW and LSW

100.105.59.3:46728 ->10.141.166.253:250

As the datagram was encapsulated by the IP address of target NC, the correct format of datagram is as below:

100.105.59.3:46728 -> 100.105.59.9:250


  • Packet capture statistics
ECS A packet loss/packets sent:171/510203

NC1 eth0 packets sent:510204

ASW/LSW packets:510204

NC2 eth0 packets received:510204

ECS B packets received:510204,capture 507442, dropped by kernel 2162

  • Per above statistics, we double confirmed that packet loss is inside of VM, known as protocol stack dropout. We modified the UDP buffer size from 212992(208KB)to 2097152(2MB)for target VM.
/proc/sys/net/core/rmem_default 
/proc/sys/net/core/rmem_max

  • After the modification, no packet loss detected

_E4_B8_A2_E5_8C_855

End

  • Above issue is one of the scenarios of protocol stack dropout, the traditional way of troubleshooting is to use tcpdump or wireshark for packet capture, here I wanna introduce Dropwatch to diagnose the issue recommended by genius @褚霸, special thanks to my boss @铁竹 who shared following link to me.

https://blog.yufeng.info/archives/2497

  • My handsome colleague @砺辛 developed a new tool for networking issue diagnose, which includes the scenario of UDP packet loss, it helps identify the root cause quickly when reproducing the issue.

https://www.atatech.org/articles/85295

最后更新:2017-08-24 12:03:23

  上一篇:go  ECS上自建Redis服务压测报告
  下一篇:go  iperf UDP测试丢包问题分析