When the TS_DEBUG_NETSTACK_LOOPBACK_PORT environment variable is set,
netstack will loop back (dnat to addressFamilyLoopback:loopbackPort)
TCP & UDP flows originally destined to localServicesIP:loopbackPort.
localServicesIP is quad-100 or the IPv6 equivalent.
Updates tailscale/corp#22713
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This was previously disabled in 8e42510 due to missing GSO-awareness in
tstun, which was resolved in d097096.
Updates tailscale/corp#22511
Signed-off-by: Jordan Whited <jordan@tailscale.com>
net/tstun.Wrapper.InjectInboundPacketBuffer is not GSO-aware, which can
break quad-100 TCP streams as a result. Linux is the only platform where
gVisor GSO was previously enabled.
Updates tailscale/corp#22511
Updates #13211
Signed-off-by: Jordan Whited <jordan@tailscale.com>
In 2f27319baf we disabled GRO due to a
data race around concurrent calls to tstun.Wrapper.Write(). This commit
refactors GRO to be thread-safe, and re-enables it on Linux.
This refactor now carries a GRO type across tstun and netstack APIs
with a lifetime that is scoped to a single tstun.Wrapper.Write() call.
In 25f0a3fc8f we used build tags to
prevent importation of gVisor's GRO package on iOS as at the time we
believed it was contributing to additional memory usage on that
platform. It wasn't, so this commit simplifies and removes those
build tags.
Updates tailscale/corp#22353
Updates tailscale/corp#22125
Updates #6816
Signed-off-by: Jordan Whited <jordan@tailscale.com>
A SIGSEGV was observed around packet merging logic in gVisor's GRO
package.
Updates tailscale/corp#22353
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This commit increases gVisor's TCP max send (4->6MiB) and receive
(4->8MiB) buffer sizes on all platforms except iOS. These values are
biased towards higher throughput on high bandwidth-delay product paths.
The iperf3 results below demonstrate the effect of this commit between
two Linux computers with i5-12400 CPUs. 100ms of RTT latency is
introduced via Linux's traffic control network emulator queue
discipline.
The first set of results are from commit f0230ce prior to TCP buffer
resizing.
gVisor write direction:
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 180 MBytes 151 Mbits/sec 0 sender
[ 5] 0.00-10.10 sec 179 MBytes 149 Mbits/sec receiver
gVisor read direction:
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.10 sec 337 MBytes 280 Mbits/sec 20 sender
[ 5] 0.00-10.00 sec 323 MBytes 271 Mbits/sec receiver
The second set of results are from this commit with increased TCP
buffer sizes.
gVisor write direction:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 297 MBytes 249 Mbits/sec 0 sender
[ 5] 0.00-10.10 sec 297 MBytes 247 Mbits/sec receiver
gVisor read direction:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.10 sec 501 MBytes 416 Mbits/sec 17 sender
[ 5] 0.00-10.00 sec 485 MBytes 407 Mbits/sec receiver
Updates #9707
Updates tailscale/corp#22119
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This commit implements TCP GRO for packets being written to gVisor on
Linux. Windows support will follow later. The wireguard-go dependency is
updated in order to make use of newly exported IP checksum functions.
gVisor is updated in order to make use of newly exported
stack.PacketBuffer GRO logic.
TCP throughput towards gVisor, i.e. TUN write direction, is dramatically
improved as a result of this commit. Benchmarks show substantial
improvement, sometimes as high as 2x. High bandwidth-delay product
paths remain receive window limited, bottlenecked by gVisor's default
TCP receive socket buffer size. This will be addressed in a follow-on
commit.
The iperf3 results below demonstrate the effect of this commit between
two Linux computers with i5-12400 CPUs. There is roughly ~13us of round
trip latency between them.
The first result is from commit 57856fc without TCP GRO.
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 4.77 GBytes 4.10 Gbits/sec 20 sender
[ 5] 0.00-10.00 sec 4.77 GBytes 4.10 Gbits/sec receiver
The second result is from this commit with TCP GRO.
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 10.6 GBytes 9.14 Gbits/sec 20 sender
[ 5] 0.00-10.00 sec 10.6 GBytes 9.14 Gbits/sec receiver
Updates #6816
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This commit implements TCP GSO for packets being read from gVisor on
Linux. Windows support will follow later. The wireguard-go dependency is
updated in order to make use of newly exported GSO logic from its tun
package.
A new gVisor stack.LinkEndpoint implementation has been established
(linkEndpoint) that is loosely modeled after its predecessor
(channel.Endpoint). This new implementation supports GSO of monster TCP
segments up to 64K in size, whereas channel.Endpoint only supports up to
32K. linkEndpoint will also be required for GRO, which will be
implemented in a follow-on commit.
TCP throughput from gVisor, i.e. TUN read direction, is dramatically
improved as a result of this commit. Benchmarks show substantial
improvement through a wide range of RTT and loss conditions, sometimes
as high as 5x.
The iperf3 results below demonstrate the effect of this commit between
two Linux computers with i5-12400 CPUs. There is roughly ~13us of round
trip latency between them.
The first result is from commit 57856fc without TCP GSO.
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 2.51 GBytes 2.15 Gbits/sec 154 sender
[ 5] 0.00-10.00 sec 2.49 GBytes 2.14 Gbits/sec receiver
The second result is from this commit with TCP GSO.
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 12.6 GBytes 10.8 Gbits/sec 6 sender
[ 5] 0.00-10.00 sec 12.6 GBytes 10.8 Gbits/sec receiver
Updates #6816
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Previously, we were registering TCP and UDP connections in the same map,
which could result in erroneously removing a mapping if one of the two
connections completes while the other one is still active.
Add a "proto string" argument to these functions to avoid this.
Additionally, take the "proto" argument in LocalAPI, and plumb that
through from the CLI and add a new LocalClient method.
Updates tailscale/corp#20600
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I35d5efaefdfbf4721e315b8ca123f0c8af9125fb
This moves NewContainsIPFunc from tsaddr to new ipset package.
And wgengine/filter types gets split into wgengine/filter/filtertype,
so netmap (and thus the CLI, etc) doesn't need to bring in ipset,
bart, etc.
Then add a test making sure the CLI deps don't regress.
Updates #1278
Change-Id: Ia246d6d9502bbefbdeacc4aef1bed9c8b24f54d5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This refactors the logic for determining whether a packet should be sent
to the host or not into a function, and then adds tests for it.
Updates #11304
Updates #12448
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ief9afa98eaffae00e21ceb7db073c61b170355e5
Fix a bug where, for a subnet router that advertizes
4via6 route, all packets with a source IP matching
the 4via6 address were being sent to the host itself.
Instead, only send to host packets whose destination
address is host's local address.
Fixestailscale/tailscale#12448
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Co-authored-by: Andrew Dunham <andrew@du.nham.ca>
This adds a new ListenPacket function on tsnet.Server
which acts mostly like `net.ListenPacket`.
Unlike `Server.Listen`, this requires listening on a
specific IP and does not automatically listen on both
V4 and V6 addresses of the Server when the IP is unspecified.
To test this, it also adds UDP support to tsdial.Dialer.UserDial
and plumbs it through the localapi. Then an associated test
to make sure the UDP functionality works from both sides.
Updates #12182
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Fixestailscale/tailscale#10393Fixestailscale/corp#15412Fixestailscale/corp#19808
On Apple platforms, exit nodes and subnet routers have been unable to relay pings from Tailscale devices to non-Tailscale devices due to sandbox restrictions imposed on our network extensions by Apple. The sandbox prevented the code in netstack.go from spawning the `ping` process which we were using.
Replace that exec call with logic to send an ICMP echo request directly, which appears to work in userspace, and not trigger a sandbox violation in the syslog.
Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
Previously, a node that was advertising a 4via6 route wouldn't be able
to make use of that same route; the packet would be delivered to
Tailscale, but since we weren't accepting it in handleLocalPackets, the
packet wouldn't be delivered to netstack and would never hit the 4via6
logic. Let's add that support so that usage of 4via6 is consistent
regardless of where the connection is initiated from.
Updates #11304
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ic28dc2e58080d76100d73b93360f4698605af7cb
This change updates all tailfs functions and the majority of the tailfs
variables to use the new drive naming.
Updates tailscale/corp#16827
Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
This change updates the tailfs file and package names to their new
naming convention.
Updates #tailscale/corp#16827
Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
This fixes a bug that was introduced in #11258 where the handling of the
per-client limit didn't properly account for the fact that the gVisor
TCP forwarder will return 'true' to indicate that it's handled a
duplicate SYN packet, but not launch the handler goroutine.
In such a case, we neither decremented our per-client limit in the
wrapper function, nor did we do so in the handler function, leading to
our per-client limit table slowly filling up without bound.
Fix this by doing the same duplicate-tracking logic that the TCP
forwarder does so we can detect such cases and appropriately decrement
our in-flight counter.
Updates tailscale/corp#12184
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ib6011a71d382a10d68c0802593f34b8153d06892
The `stack.PacketBufferPtr` type no longer exists; replace it with
`*stack.PacketBuffer` instead.
Updates #8043
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ib56ceff09166a042aa3d9b80f50b2aa2d34b3683
This fixes a regression introduced with 993acf4 and released in
v1.60.0.
The regression caused us to intercept all userspace traffic to port
8080 which prevented users from exposing their own services to their
tailnet at port 8080.
Now, we only intercept traffic to port 8080 if it's bound for
100.100.100.100 or fd7a:115c:a1e0::53.
Fixes#11283
Signed-off-by: Percy Wegmann <percy@tailscale.com>
(cherry picked from commit 17cd0626f3)
This is a fun one. Right now, when a client is connecting through a
subnet router, here's roughly what happens:
1. The client initiates a connection to an IP address behind a subnet
router, and sends a TCP SYN
2. The subnet router gets the SYN packet from netstack, and after
running through acceptTCP, starts DialContext-ing the destination IP,
without accepting the connection¹
3. The client retransmits the SYN packet a few times while the dial is
in progress, until either...
4. The subnet router successfully establishes a connection to the
destination IP and sends the SYN-ACK back to the client, or...
5. The subnet router times out and sends a RST to the client.
6. If the connection was successful, the client ACKs the SYN-ACK it
received, and traffic starts flowing
As a result, the notification code in forwardTCP never notices when a
new connection attempt is aborted, and it will wait until either the
connection is established, or until the OS-level connection timeout is
reached and it aborts.
To mitigate this, add a per-client limit on how many in-flight TCP
forwarding connections can be in-progress; after this, clients will see
a similar behaviour to the global limit, where new connection attempts
are aborted instead of waiting. This prevents a single misbehaving
client from blocking all other clients of a subnet router by ensuring
that it doesn't starve the global limiter.
Also, bump the global limit again to a higher value.
¹ We can't accept the connection before establishing a connection to the
remote server since otherwise we'd be opening the connection and then
immediately closing it, which breaks a bunch of stuff; see #5503 for
more details.
Updates tailscale/corp#12184
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I76e7008ddd497303d75d473f534e32309c8a5144
- add a clientmetric with a counter of TCP forwarder drops due to the
max attempts;
- fix varz metric types, as they are all counters.
Updates #8210
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
FileSystemForLocal was listening on the node's Tailscale address,
which potentially exposes the user's view of TailFS shares to other
Tailnet users. Remote nodes should connect to exported shares via
the peerapi.
This removes that code so that FileSystemForLocal is only avaialable
on 100.100.100.100:8080.
Updates tailscale/corp#16827
Signed-off-by: Percy Wegmann <percy@tailscale.com>
Adds support for node attribute tailfs:access. If this attribute is
not present, Tailscale will not accept connections to the local TailFS
server at 100.100.100.100:8080.
Updates tailscale/corp#16827
Signed-off-by: Percy Wegmann <percy@tailscale.com>
Add a WebDAV-based folder sharing mechanism that is exposed to local clients at
100.100.100.100:8080 and to remote peers via a new peerapi endpoint at
/v0/tailfs.
Add the ability to manage folder sharing via the new 'share' CLI sub-command.
Updates tailscale/corp#16827
Signed-off-by: Percy Wegmann <percy@tailscale.com>
When tailscaled is run with "-debug 127.0.0.1:12345", these metrics are
available at:
http://localhost:12345/debug/metrics
Updates #8210
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I19db6c445ac1f8344df2bc1066a3d9c9030606f8
Prior to an earlier netstack bump this code used a string conversion
path to cover multiple cases of behavior seemingly checking for
unspecified addresses, adding unspecified addresses to v6. The behavior
is now crashy in netstack, as it is enforcing address length in various
areas of the API, one in particular being address removal.
As netstack is now protocol specific, we must not create invalid
protocol addresses - an address is v4 or v6, and the address value
contained inside must match. If a control path attempts to do something
otherwise it is now logged and skipped rather than incorrect addressing
being added.
Fixestailscale/corp#15377
Signed-off-by: James Tucker <james@tailscale.com>
Prepare for path MTU discovery by splitting up the concept of
DefaultMTU() into the concepts of the Tailscale TUN MTU, MTUs of
underlying network interfaces, minimum "safe" TUN MTU, user configured
TUN MTU, probed path MTU to a peer, and maximum probed MTU. Add a set
of likely MTUs to probe.
Updates #311
Signed-off-by: Val <valerie@tailscale.com>
Use buffer pools for UDP packet forwarding to prepare for increasing the
forwarded UDP packet size for peer path MTU discovery.
Updates #311
Co-authored-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Val <valerie@tailscale.com>
We weren't correctly retrying truncated requests to an upstream DNS
server with TCP. Instead, we'd return a truncated request to the user,
even if the user was querying us over TCP and thus able to handle a
large response.
Also, add an envknob and controlknob to allow users/us to disable this
behaviour if it turns out to be buggy (✨ DNS ✨).
Updates #9264
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ifb04b563839a9614c0ba03e9c564e8924c1a2bfd
Prepare for path MTU discovery by splitting up the concept of
DefaultMTU() into the concepts of the Tailscale TUN MTU, MTUs of
underlying network interfaces, minimum "safe" TUN MTU, user configured
TUN MTU, probed path MTU to a peer, and maximum probed MTU. Add a set
of likely MTUs to probe.
Updates #311
Signed-off-by: Val <valerie@tailscale.com>
Use buffer pools for UDP packet forwarding to prepare for increasing the
forwarded UDP packet size for peer path MTU discovery.
Updates #311
Co-authored-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Val <valerie@tailscale.com>
And convert all callers over to the methods that check SelfNode.
Now we don't have multiple ways to express things in tests (setting
fields on SelfNode vs NetworkMap, sometimes inconsistently) and don't
have multiple ways to check those two fields (often only checking one
or the other).
Updates #9443
Change-Id: I2d7ba1cf6556142d219fae2be6f484f528756e3c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>