Disable TCP & UDP GRO if the probe fails.
torvalds/linux@e269d79c7d broke virtio_net
TCP & UDP GRO causing GRO writes to return EINVAL. The bug was then
resolved later in
torvalds/linux@89add40066. The offending
commit was pulled into various LTS releases.
Updates #13041
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This commit adds a new usermetric package and wires
up metrics across the tailscale client.
Updates tailscale/corp#22075
Co-authored-by: Anton Tolchanov <anton@tailscale.com>
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
In 2f27319baf we disabled GRO due to a
data race around concurrent calls to tstun.Wrapper.Write(). This commit
refactors GRO to be thread-safe, and re-enables it on Linux.
This refactor now carries a GRO type across tstun and netstack APIs
with a lifetime that is scoped to a single tstun.Wrapper.Write() call.
In 25f0a3fc8f we used build tags to
prevent importation of gVisor's GRO package on iOS as at the time we
believed it was contributing to additional memory usage on that
platform. It wasn't, so this commit simplifies and removes those
build tags.
Updates tailscale/corp#22353
Updates tailscale/corp#22125
Updates #6816
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This commit implements TCP GRO for packets being written to gVisor on
Linux. Windows support will follow later. The wireguard-go dependency is
updated in order to make use of newly exported IP checksum functions.
gVisor is updated in order to make use of newly exported
stack.PacketBuffer GRO logic.
TCP throughput towards gVisor, i.e. TUN write direction, is dramatically
improved as a result of this commit. Benchmarks show substantial
improvement, sometimes as high as 2x. High bandwidth-delay product
paths remain receive window limited, bottlenecked by gVisor's default
TCP receive socket buffer size. This will be addressed in a follow-on
commit.
The iperf3 results below demonstrate the effect of this commit between
two Linux computers with i5-12400 CPUs. There is roughly ~13us of round
trip latency between them.
The first result is from commit 57856fc without TCP GRO.
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 4.77 GBytes 4.10 Gbits/sec 20 sender
[ 5] 0.00-10.00 sec 4.77 GBytes 4.10 Gbits/sec receiver
The second result is from this commit with TCP GRO.
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 10.6 GBytes 9.14 Gbits/sec 20 sender
[ 5] 0.00-10.00 sec 10.6 GBytes 9.14 Gbits/sec receiver
Updates #6816
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This commit implements TCP GSO for packets being read from gVisor on
Linux. Windows support will follow later. The wireguard-go dependency is
updated in order to make use of newly exported GSO logic from its tun
package.
A new gVisor stack.LinkEndpoint implementation has been established
(linkEndpoint) that is loosely modeled after its predecessor
(channel.Endpoint). This new implementation supports GSO of monster TCP
segments up to 64K in size, whereas channel.Endpoint only supports up to
32K. linkEndpoint will also be required for GRO, which will be
implemented in a follow-on commit.
TCP throughput from gVisor, i.e. TUN read direction, is dramatically
improved as a result of this commit. Benchmarks show substantial
improvement through a wide range of RTT and loss conditions, sometimes
as high as 5x.
The iperf3 results below demonstrate the effect of this commit between
two Linux computers with i5-12400 CPUs. There is roughly ~13us of round
trip latency between them.
The first result is from commit 57856fc without TCP GSO.
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 2.51 GBytes 2.15 Gbits/sec 154 sender
[ 5] 0.00-10.00 sec 2.49 GBytes 2.14 Gbits/sec receiver
The second result is from this commit with TCP GSO.
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 12.6 GBytes 10.8 Gbits/sec 6 sender
[ 5] 0.00-10.00 sec 12.6 GBytes 10.8 Gbits/sec receiver
Updates #6816
Signed-off-by: Jordan Whited <jordan@tailscale.com>
I noticed we were allocating these every time when they could just
share the same memory. Rather than document ownership, just lock it
down with a view.
I was considering doing all of the fields but decided to just do this
one first as test to see how infectious it became. Conclusion: not
very.
Updates #cleanup (while working towards tailscale/corp#20514)
Change-Id: I8ce08519de0c9a53f20292adfbecd970fe362de0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In a configuration where the local node (ip1) has a different IP (ip2)
that it uses to communicate with a peer (ip3) we would do UDP flow
tracking on the `ip2->ip3` tuple. When we receive the response from
the peer `ip3->ip2` we would dnat it back to `ip3->ip1` which would
then not match the flow track state and the packet would get dropped.
To fix this, we should do flow tracking on the `ip1->ip3` tuple instead
of `ip2->ip3` which requires doing SNAT after the running filterPacketOutboundToWireGuard.
Updates tailscale/corp#19971, tailscale/corp#8020
Signed-off-by: Maisem Ali <maisem@tailscale.com>
This adds a new bool that can be sent down from control
to do jailing on the client side. Previously this would
only be done from control by modifying the packet filter
we sent down to clients. This would result in a lot of
additional work/CPU on control, we could instead just
do this on the client. This has always been a TODO which
we keep putting off, might as well do it now.
Updates tailscale/corp#19623
Signed-off-by: Maisem Ali <maisem@tailscale.com>
This plumbs a packet filter for jailed nodes through to the
tstun.Wrapper; the filter for a jailed node is equivalent to a "shields
up" filter. Currently a no-op as there is no way for control to
tell the client whether a peer is jailed.
Updates tailscale/corp#19623
Co-authored-by: Andrew Dunham <andrew@du.nham.ca>
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Change-Id: I5ccc5f00e197fde15dd567485b2a99d8254391ad
This refactors the peerConfig struct to allow storing more
details about a peer and not just the masq addresses. To be
used in a follow up change.
As a side effect, this also makes the DNAT logic on the inbound
packet stricter. Previously it would only match against the packets
dst IP, not it also takes the src IP into consideration. The beahvior
is at parity with the SNAT case.
Updates tailscale/corp#19623
Co-authored-by: Andrew Dunham <andrew@du.nham.ca>
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Change-Id: I5f40802bebbf0f055436eb8824e4511d0052772d
So that we can use this for additional, non-NAT configuration without it
being confusing.
Updates #cleanup
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I1658d59c9824217917a94ee76d2d08f0a682986f
This was a holdover from the older, pre-BART days and is no longer
necessary.
Updates #cleanup
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I71b892bab1898077767b9ff51cef33d59c08faf8
Certain device drivers (e.g. vxlan, geneve) do not properly handle
coalesced UDP packets later in the stack, resulting in packet loss.
Updates #11026
Signed-off-by: Jordan Whited <jordan@tailscale.com>
At least in userspace-networking mode.
Fixes#11361
Change-Id: I78d33f0f7e05fe9e9ee95b97c99b593f8fe498f2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This implementation uses less memory than tempfork/device,
which helps avoid OOM conditions in the iOS VPN extension when
switching to a Tailnet with ExitNode routing enabled.
Updates tailscale/corp#18514
Signed-off-by: Percy Wegmann <percy@tailscale.com>
The `stack.PacketBufferPtr` type no longer exists; replace it with
`*stack.PacketBuffer` instead.
Updates #8043
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ib56ceff09166a042aa3d9b80f50b2aa2d34b3683
Run `staticcheck` with `U1000` to find unused code. This cleans up about
a half of it. I'll do the other half separately to keep PRs manageable.
Updates #cleanup
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
The current structure meant that we were embedding netstack in
the tailscale CLI and in the GUIs. This removes that by isolating
the checksum munging to a different pkg which is only called from
`net/tstun`.
Fixes#9756
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Automatically probe the path MTU to a peer when peer MTU is enabled, but do not
use the MTU information for anything yet.
Updates #311
Signed-off-by: Val <valerie@tailscale.com>
Now that corp is updated, remove the shim code to bridge the rename from
DefaultMTU() to DefaultTUNMTU.
Updates #311
Signed-off-by: Val <valerie@tailscale.com>
Prepare for path MTU discovery by splitting up the concept of
DefaultMTU() into the concepts of the Tailscale TUN MTU, MTUs of
underlying network interfaces, minimum "safe" TUN MTU, user configured
TUN MTU, probed path MTU to a peer, and maximum probed MTU. Add a set
of likely MTUs to probe.
Updates #311
Signed-off-by: Val <valerie@tailscale.com>
Prepare for path MTU discovery by splitting up the concept of
DefaultMTU() into the concepts of the Tailscale TUN MTU, MTUs of
underlying network interfaces, minimum "safe" TUN MTU, user configured
TUN MTU, probed path MTU to a peer, and maximum probed MTU. Add a set
of likely MTUs to probe.
Updates #311
Signed-off-by: Val <valerie@tailscale.com>
It might as well have been spewing out gibberish. This adds
a nicer output format for us to be able to read and identify
whats going on.
Sample output
```
natV4Config{nativeAddr: 100.83.114.95, listenAddrs: [10.32.80.33], dstMasqAddrs: [10.32.80.33: 407 peers]}
```
Fixestailscale/corp#14650
Signed-off-by: Maisem Ali <maisem@tailscale.com>
This PR plumbs through awareness of an IPv6 SNAT/masquerade address from the wire protocol
through to the low-level (tstun / wgengine). This PR is the first in two PRs for implementing
IPv6 NAT support to/from peers.
A subsequent PR will implement the data-plane changes to implement IPv6 NAT - this is just plumbing.
Signed-off-by: Tom DNetto <tom@tailscale.com>
Updates ENG-991
I didn't clean up the more idiomatic map[T]bool with true values, at
least yet. I just converted the relatively awkward struct{}-valued
maps.
Updates #cleanup
Change-Id: I758abebd2bb1f64bc7a9d0f25c32298f4679c14f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I'm not saying it works, but it compiles.
Updates #5794
Change-Id: I2f3c99732e67fe57a05edb25b758d083417f083e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Also fix a js/wasm issue with tsnet in the process. (same issue as WASI)
Updates #8320Fixes#8315
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This is part of an effort to clean up tailscaled initialization between
tailscaled, tailscaled Windows service, tsnet, and the mac GUI.
Updates #8036
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In the case where the exit node requires SNAT, we would SNAT all traffic not just the
traffic meant to go through the exit node. This was a result of the default route being
added to the routing table which would match basically everything.
In this case, we need to account for all peers in the routing table not just the ones
that require NAT.
Fix and add a test.
Updates tailscale/corp#8020
Signed-off-by: Maisem Ali <maisem@tailscale.com>