Tuning packet size? High PPS/bw from ACKs over VPN bond link

Forums Network Management VPN Tuning packet size? High PPS/bw from ACKs over VPN bond link

  • This topic is empty.
Viewing 4 posts - 16 through 19 (of 19 total)
  • Author
    Posts
  • #50967
    houkouonchi
    Member

    Ok and I see:

    http://www.zeroshell.net/eng/forum/viewtopic.php?t=821

    which seems to verify my suspicion that ACKs are being sent over both lines and thus duplicated. Using some of the things listed in that thread did help with my bandwidth usage (upload was only around 3 meg instead of 4 meg) and now iperf shows better speeds atleast on the downstream side:

    root@zeroshell root> ./iperf -c 10.99.98.2


    Client connecting to 10.99.98.2, TCP port 5001
    TCP window size: 16.0 KByte (default)


    [ 3] local 10.99.98.1 port 48910 connected with 10.99.98.2 port 5001
    [ ID] Interval Transfer Bandwidth
    [ 3] 0.0-10.1 sec 83.1 MBytes 69.3 Mbits/sec

    Upstream was not so good:

    root@zeroshell root> ./iperf -s


    Server listening on TCP port 5001
    TCP window size: 85.3 KByte (default)


    [ 4] local 10.99.98.1 port 5001 connected with 10.99.98.2 port 50552
    [ ID] Interval Transfer Bandwidth
    [ 4] 0.0-10.3 sec 29.6 MBytes 24.1 Mbits/sec

    So I think my problem is just the duplicate ACKs. Is there anyway I can stop them? Running this web100 NDT test to my server (server that is not a ZS box) I get a crapton of duplicate ACKs:

    admin@zeroshell: 06:48 AM :/proc# web100clt -l -n 208.97.141.21
    Testing network path for configuration and performance problems — Using IPv4 address
    Checking for Middleboxes . . . . . . . . . . . . . . . . . . Done
    checking for firewalls . . . . . . . . . . . . . . . . . . . Done
    running 10s outbound test (client to server) . . . . . 64.47 Mb/s
    running 10s inbound test (server to client) . . . . . . 64.43 Mb/s
    The slowest link in the end-to-end path is a 10 Gbps 10 Gigabit Ethernet/OC-192 subnet
    Information: Other network traffic is congesting the link
    Information: The receive buffer should be 20593 kbytes to maximize throughput
    Server ‘208.97.141.21’ is not behind a firewall. [Connection to the ephemeral port was successful]
    Client is probably behind a firewall. [Connection to the ephemeral port failed]


    Web100 Detailed Analysis



    Web100 reports the Round trip time = 16.87 msec;the Packet size = 1350 Bytes; and
    There were 31 packets retransmitted, 11327 duplicate acks received, and 57519 SACK blocks received
    Packets arrived out-of-order 18.92% of the time.
    This connection is sender limited 30.29% of the time.
    Increasing the current send buffer (256.00 KB) will improve performance
    This connection is network limited 69.71% of the time.

    When running the same test from my desktop here at work I get way less duplicate ACKs even though way more packets/data was sent/received due to the much faster link speed:

    root@sigito: 06:44 AM :~# web100clt -l -n 208.97.141.21
    Testing network path for configuration and performance problems — Using IPv4 address
    Checking for Middleboxes . . . . . . . . . . . . . . . . . . Done
    checking for firewalls . . . . . . . . . . . . . . . . . . . Done
    running 10s outbound test (client to server) . . . . . 553.38 Mb/s
    running 10s inbound test (server to client) . . . . . . 565.91 Mb/s
    The slowest link in the end-to-end path is a a 622 Mbps OC-12 subnet
    Information: Other network traffic is congesting the link
    Server ‘208.97.141.21’ is not behind a firewall. [Connection to the ephemeral port was successful]
    Client is probably behind a firewall. [Connection to the ephemeral port failed]


    Web100 Detailed Analysis



    Web100 reports the Round trip time = 3.64 msec;the Packet size = 1368 Bytes; and
    There were 2 packets retransmitted, 392 duplicate acks received, and 393 SACK blocks received
    Packets arrived out-of-order 0.15% of the time.
    This connection is sender limited 97.60% of the time.
    This connection is network limited 2.40% of the time.

    #50968
    ppalias
    Member

    If ACKs is your problem you can verify it by using UDP for the transport of the iperf. Just add the “-u” option.

    #50969

    Asymmetric Routing – It May Be The Solution
    =================================
    We were experiencing highly similar issues as the original poster, so I thought I would share my solution.

    This method works well and is intended for those users that are focusing on increasing upload bandwidth only, as the goal is to use the bonded channel for upstream traffic only, and an Unbonded VPN Channel for the downstream, thus 2 x (25Mbps/5Mbps) will give you (25Mbps/10Mbps), which is a better ratio all together if upstream is your concern.

    We are bonding two VDSL2 modems together from the same provider, each 25/5, however the upload performance was actually capped at 7Mbps as shown on the modem diagnostics page.

    We had the same problem with excessive ACKs, except the problem was always traffic from the DC to the Bonded Site. Doesn’t matter what the source is. Traffic going from the Bonded Site to the DC is excellent! We are getting consistent 10.5-11.0Mbps of upload on TCP, and close to 12Mbps on UDP (test using iperf between bonded site and DC).

    Downstream is horrible, any single connection would average 1.7Mbps and at best 2Mbps. Specifying -P 10 parameter to push 10 TCP connections resulted max 8.5Mbps, with WireShark showing excessive Duplicate ACKs, Fast Retransmissions, Previous Segment Not Captured, Packet out of Order.

    As for the reason why, it likely has to do with the unpredictable nature of the internet, the QoS of your connection, congestion, the underlying architecture of the internet backbone, and the peering capacity between your two endpoints.

    In our case, the peering capacity between the VDSL Provider and the DataCenter Provider isn’t the greatest. The download consistency is better than the upload at the DC end, which is on 100Mbps connection, but the CIR/Committed Information Rate we purchased was 10Mbps, so it is burstable rather than dedicated 100Mbps.

    The algorithm used to control the connection at times of high congestion works by for example, allowing a 10ms burst of traffic at 100Mbps, followed by a 90ms pause, instead of a consistent stream of 10Mbps, which really messes up TCP.

    The bonded DSL lines on the other hand allows for constant streaming up to it’s capped capacity. Because the modem is capped at 7Mbps, it will not have an opportunity to burst way beyond that, pause, and burst again to average 7Mbps.

    To help troubleshoot, it is a good idea to measure the upstream/downstream consistency of the underlying connections. Visualware offers a free online speed test that graphs transmission speeds, and TCP forced delays.

    So the connection ended up being 1.7/10 instead of 50/10 (minus the overhead). However our goal is increasing the upload and download is fine as-is at 25Mbps.

    The solution was asymmetric routing:

    66.119.x.y [ZeroShell Bonded]========[ZeroShell DC] 66.119.x.x

    At ZeroShell DC:


    bond0 = 10.32.161.10 (Slaves VPN00, VPN01)
    vpn02 = 10.32.162.10

    Static Route:
    –>> ip route 66.119.x.y 255.255.255.248 10.32.162.20 [vpn02]

    At ZeroShell Bonded Site:


    bond0 = 10.32.161.20
    vpn02 = 10.32.162.20

    Static Route:
    –>> ip route 0.0.0.0 0.0.0.0 10.32.161.20 [bond0]

    Thus we are only routing upstream traffic through the bonded interfaces, and all downstream traffic thru vpn02 (unbonded). After doing that, we effectively got very close to 25/14Mbps – of course minus overhead with the occasional retransmission and out of order packet which is true of any network -bonded, vpn or not- so actual performance was 24/12Mbps.

    The underlying OpenVPN Links were UDP. Just FYI doing TCP over TCP doesn’t make sense and as proof of theory, an experiment resulted in transfer speeds under 2Mbps/2Mbps and a whole array of serious TCP errors not only on downstream but upstream as well seen in WireShark.

    Hope this helps someone solve their frustrations! 😀

    #50970
    m_elias
    Member

    Thank you LastwagenMann for that write up! A while back, I tried load balancing two DSL 7/0.4 Mbps connections with two bonded VPN tunnels. Back then, I found the ACKs for both tunnels were sent up both tunnels essentially resulting in double the uploads which severely limited our downloads. I have been trying to figure out how I can increase our uploads as it’s becoming more popular to upload pictures and videos. One option is to just add a 3rd DSL connection, but I would like to increase our uploads by a factor of 5 or greater, not 1.5. We have a new WISP in the area that might be able to provide me with a custom connection something like 2/5 Mbps, at which point I would like to use the WISP for uploads, and the DSL for downloads. It looks like your routing solution might be the key for this to work, but I also need to figure out why the uploads were being doubled.

Viewing 4 posts - 16 through 19 (of 19 total)
  • You must be logged in to reply to this topic.