Most of the magicsock tests fake the network, simulating packets going
out and coming in. There's no reason to actually hit your router to do
UPnP/NAT-PMP/PCP during in tests. But while debugging thousands of
iterations of tests to deflake some things, I saw it slamming my
router. This stops that.
Updates #11762
Change-Id: I59b9f48f8f5aff1fa16b4935753d786342e87744
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Seems to deflake tstest/integration tests. I can't reproduce it
anymore on one of my VMs that was consistently flaking after a dozen
runs before. Now I can run hundreds of times.
Updates #11649Fixes#7036
Change-Id: I2f7d4ae97500d507bdd78af9e92cd1242e8e44b8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I'm on a mission to simplify LocalBackend.Start and its locking
and deflake some tests.
I noticed this hasn't been used since March 2023 when it was removed
from the Windows client in corp 66be796d33c.
So, delete.
Updates #11649
Change-Id: I40f2cb75fb3f43baf23558007655f65a8ec5e1b2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We have seen in macOS client logs that the "operation not permitted", a
syscall.EPERM error, is being returned when traffic is attempted to be
sent. This may be caused by security software on the client.
This change will perform a rebind and restun if we receive a
syscall.EPERM error on clients running darwin. Rebinds will only be
called if we haven't performed one specifically for an EPERM error in
the past 5 seconds.
Updates #11710
Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
Trying to run iptables/nftables on Synology pauses for minutes with
lots of errors and ultimately does nothing as it's not used and we
lack permissions.
This fixes a regression from db760d0bac (#11601) that landed
between Synology testing on unstable 1.63.110 and 1.64.0 being cut.
Fixes#11737
Change-Id: Iaf9563363b8e45319a9b6fe94c8d5ffaecc9ccef
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Just because we don't have known endpoints for a peer does not mean that
the peer should become unreachable. If we know the peers key, it should
be able to call us, then we can talk back via whatever path it called us
on. First step - don't drop the packet in this context.
Updates tailscale/corp#19106
Signed-off-by: James Tucker <james@tailscale.com>
This removes a potentially increased boot delay for certain boot
topologies where they block on ExecStartPre that may have socket
activation dependencies on other system services (such as
systemd-resolved and NetworkManager).
Also rename cleanup to clean up in affected/immediately nearby places
per code review commentary.
Fixes#11599
Signed-off-by: James Tucker <james@tailscale.com>
It was used when we only supported subnet routers on linux
and would nil out the SubnetRoutes slice as no other router
worked with it, but now we support subnet routers on ~all platforms.
The field it was setting to nil is now only used for network logging
and nowhere else, so keep the field but drop the SubnetRouterWrapper
as it's not useful.
Updates #cleanup
Change-Id: Id03f9b6ec33e47ad643e7b66e07911945f25db79
Signed-off-by: Maisem Ali <maisem@tailscale.com>
The netcheck package and the magicksock package coordinate via the
health package, but both sides have time based heuristics through
indirect dependencies. These were misaligned, so the implemented
heuristic aimed at reducing DERP moves while there is active traffic
were non-operational about 3/5ths of the time.
It is problematic to setup a good test for this integration presently,
so instead I added comment breadcrumbs along with the initial fix.
Updates #8603
Signed-off-by: James Tucker <james@tailscale.com>
Only on Gokrazy, set sysctls to enable IP forwarding so subnet routing
and advertised exit node works.
Fixes#11405
Signed-off-by: Joonas Kuorilehto <joneskoo@derbian.fi>
This allows clients to avoid establishing their VPN multiple times when
both routes and DNS are changing in rapid succession.
Updates tailscale/corp#18928
Signed-off-by: Percy Wegmann <percy@tailscale.com>
This change updates all tailfs functions and the majority of the tailfs
variables to use the new drive naming.
Updates tailscale/corp#16827
Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
This change updates the tailfs file and package names to their new
naming convention.
Updates #tailscale/corp#16827
Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
We have hosts that support IPv6, but not IPv6 firewall configuration
in iptables mode.
We also have hosts that have some support for IPv6 firewall
configuration in iptables mode, but do not have iptables filter table.
We should:
- configure ip rules for all hosts that support IPv6
- only configure firewall rules in iptables mode if the host
has iptables filter table.
Updates tailscale/tailscale#11540
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Use the zstdframe package where sensible instead of plumbing
around our own zstd.Encoder just for stateless operations.
This causes logtail to have a dependency on zstd,
but that's arguably okay since zstd support is implicit
to the protocol between a client and the logging service.
Also, virtually every caller to logger.NewLogger was
manually setting up a zstd.Encoder anyways,
meaning that zstd was functionally always a dependency.
Updates #cleanup
Updates tailscale/corp#18514
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
This fixes a bug that was introduced in #11258 where the handling of the
per-client limit didn't properly account for the fact that the gVisor
TCP forwarder will return 'true' to indicate that it's handled a
duplicate SYN packet, but not launch the handler goroutine.
In such a case, we neither decremented our per-client limit in the
wrapper function, nor did we do so in the handler function, leading to
our per-client limit table slowly filling up without bound.
Fix this by doing the same duplicate-tracking logic that the TCP
forwarder does so we can detect such cases and appropriately decrement
our in-flight counter.
Updates tailscale/corp#12184
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ib6011a71d382a10d68c0802593f34b8153d06892
This pretty much always results in an outage because peers won't
discover our new home region and thus won't be able to establish
connectivity.
Updates tailscale/corp#18095
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ic0d09133f198b528dd40c6383b16d7663d9d37a7
The `stack.PacketBufferPtr` type no longer exists; replace it with
`*stack.PacketBuffer` instead.
Updates #8043
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ib56ceff09166a042aa3d9b80f50b2aa2d34b3683
Since link-local addresses are definitionally more likely to be a direct
(lower-latency, more reliable) connection than a non-link-local private
address, give those a bit of a boost when selecting endpoints.
Updates #8097
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I93fdeb07de55ba39ba5fcee0834b579ca05c2a4e
This was just added in 69f4b459 which doesn't yet use it. This still
doesn't yet use it. It just pushes it down deeper into magicsock where
it'll used later.
Updates #7617
Change-Id: If2f8fd380af150ffc763489e1ff4f8ca2899fac6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This fixes a regression introduced with 993acf4 and released in
v1.60.0.
The regression caused us to intercept all userspace traffic to port
8080 which prevented users from exposing their own services to their
tailnet at port 8080.
Now, we only intercept traffic to port 8080 if it's bound for
100.100.100.100 or fd7a:115c:a1e0::53.
Fixes#11283
Signed-off-by: Percy Wegmann <percy@tailscale.com>
(cherry picked from commit 17cd0626f3)
This adds a method to wgengine.Engine and plumbed down into magicsock
to add a way to get a type-safe Tailscale-safe wrapper around a
wireguard-go device.Peer that only exposes methods that are safe for
Tailscale to use internally.
It also removes HandshakeAttempts from PeerStatusLite that was just
added as it wasn't needed yet and is now accessible ala cart as needed
from the Peer type accessor.
None of this is used yet.
Updates #7617
Change-Id: I07be0c4e6679883e6eeddf8dbed7394c9e79c5f4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
... rather than 1970. Code was using IsZero against the 1970 team
(which isn't a zero value), but fortunately not anywhere that seems to
have mattered.
Updates #cleanup
Change-Id: I708a3f2a9398aaaedc9503678b4a8a311e0e019e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This is a fun one. Right now, when a client is connecting through a
subnet router, here's roughly what happens:
1. The client initiates a connection to an IP address behind a subnet
router, and sends a TCP SYN
2. The subnet router gets the SYN packet from netstack, and after
running through acceptTCP, starts DialContext-ing the destination IP,
without accepting the connection¹
3. The client retransmits the SYN packet a few times while the dial is
in progress, until either...
4. The subnet router successfully establishes a connection to the
destination IP and sends the SYN-ACK back to the client, or...
5. The subnet router times out and sends a RST to the client.
6. If the connection was successful, the client ACKs the SYN-ACK it
received, and traffic starts flowing
As a result, the notification code in forwardTCP never notices when a
new connection attempt is aborted, and it will wait until either the
connection is established, or until the OS-level connection timeout is
reached and it aborts.
To mitigate this, add a per-client limit on how many in-flight TCP
forwarding connections can be in-progress; after this, clients will see
a similar behaviour to the global limit, where new connection attempts
are aborted instead of waiting. This prevents a single misbehaving
client from blocking all other clients of a subnet router by ensuring
that it doesn't starve the global limiter.
Also, bump the global limit again to a higher value.
¹ We can't accept the connection before establishing a connection to the
remote server since otherwise we'd be opening the connection and then
immediately closing it, which breaks a bunch of stuff; see #5503 for
more details.
Updates tailscale/corp#12184
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I76e7008ddd497303d75d473f534e32309c8a5144
Not yet used. This is being made available so magicsock/wgengine can
use it to ignore certain sends (UDP + DERP) later on at least mobile,
letting wireguard-go think it's doing its full attempt schedule, but
we can cut it short conditionally based on what we know from the
control plane.
Updates #7617
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: Ia367cf6bd87b2aeedd3c6f4989528acdb6773ca7
Otherwise on OS retransmits, we'd make redundant timers in Go's timer
heap that upon firing just do nothing (well, grab a mutex and check a
map and see that there's nothing to do).
Updates #cleanup
Change-Id: Id30b8b2d629cf9c7f8133a3f7eca5dc79e81facb
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
No need to hold wgLock while using the device to LookupPeer;
that has its own mutex already.
Updates #cleanup
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: Ib56049fcc7163cf5a2c2e7e12916f07b4f9d67cb
Tailscaled becomes inoperative if the Tailscale Tunnel wintun adapter is abruptly removed.
wireguard-go closes the device in case of a read error, but tailscaled keeps running.
This adds detection of a closed WireGuard device, triggering a graceful shutdown of tailscaled.
It is then restarted by the tailscaled watchdog service process.
Fixes#11222
Signed-off-by: Nick Khyl <nickk@tailscale.com>
- add a clientmetric with a counter of TCP forwarder drops due to the
max attempts;
- fix varz metric types, as they are all counters.
Updates #8210
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
That's already the default. Avoid the overhead of writing it on one
side and reading it on the other to do nothing.
Updates #cleanup (noticed while researching something else)
Change-Id: I449c88a022271afb9be5da876bfaf438fe5d3f58
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
An increasing number of users have very large subnet route
configurations, which can produce very large amounts of log data when
WireGuard is reconfigured. The logs don't contain the actual routes, so
they're largely useless for diagnostics, so we'll just suppress them.
Fixestailscale/corp#17532
Signed-off-by: James Tucker <james@tailscale.com>
Looking at profiles, we spend a lot of time in winipcfg.LUID.DeleteRoute
looking up the routing table entry for the provided RouteData.
But we already have the row! We previously obtained that data via the full
table dump we did in getInterfaceRoutes. We can make this a lot faster by
hanging onto a reference to the wipipcfg.MibIPforwardRow2 and executing
the delete operation directly on that.
Fixes#11123
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
FileSystemForLocal was listening on the node's Tailscale address,
which potentially exposes the user's view of TailFS shares to other
Tailnet users. Remote nodes should connect to exported shares via
the peerapi.
This removes that code so that FileSystemForLocal is only avaialable
on 100.100.100.100:8080.
Updates tailscale/corp#16827
Signed-off-by: Percy Wegmann <percy@tailscale.com>
Adds support for node attribute tailfs:access. If this attribute is
not present, Tailscale will not accept connections to the local TailFS
server at 100.100.100.100:8080.
Updates tailscale/corp#16827
Signed-off-by: Percy Wegmann <percy@tailscale.com>
Add a WebDAV-based folder sharing mechanism that is exposed to local clients at
100.100.100.100:8080 and to remote peers via a new peerapi endpoint at
/v0/tailfs.
Add the ability to manage folder sharing via the new 'share' CLI sub-command.
Updates tailscale/corp#16827
Signed-off-by: Percy Wegmann <percy@tailscale.com>
This commit implements probing of UDP path lifetime on the tail end of
an active direct connection. Probing configuration has two parts -
Cliffs, which are various timeout cliffs of interest, and
CycleCanStartEvery, which limits how often a probing cycle can start,
per-endpoint. Initially a statically defined default configuration will
be used. The default configuration has cliffs of 10s, 30s, and 60s,
with a CycleCanStartEvery of 24h. Probing results are communicated via
clientmetric counters. Probing is off by default, and can be enabled
via control knob. Probing is purely informational and does not yet
drive any magicsock behaviors.
Updates #540
Signed-off-by: Jordan Whited <jordan@tailscale.com>
When tailscaled is run with "-debug 127.0.0.1:12345", these metrics are
available at:
http://localhost:12345/debug/metrics
Updates #8210
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I19db6c445ac1f8344df2bc1066a3d9c9030606f8
This is in response to logs from a customer that show that we're unable
to run netsh due to the following error:
router: firewall: adding Tailscale-Process rule to allow UDP for "C:\\Program Files\\Tailscale\\tailscaled.exe" ...
router: firewall: error adding Tailscale-Process rule: exec: "netsh": cannot run executable found relative to current directory:
There's approximately no reason to ever dynamically look up the path of
a system utility like netsh.exe, so instead let's first look for it
in the System32 directory and only if that fails fall back to the
previous behaviour.
Updates #10804
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I68cfeb4cab091c79ccff3187d35f50359a690573
Run `staticcheck` with `U1000` to find unused code. This cleans up about
a half of it. I'll do the other half separately to keep PRs manageable.
Updates #cleanup
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
The switch in Conn.runDerpReader() on the derp.ReceivedMessage type
contained cases other than derp.ReceivedPacket that fell through to
writing to c.derpRecvCh, which should only be reached for
derp.ReceivedPacket. This can result in the last/previous
derp.ReceivedPacket to be re-handled, effectively creating a duplicate
packet. If the last derp.ReceivedPacket happens to be a
disco.CallMeMaybe it may result in a disco ping scan towards the
originating peer on the endpoints contained.
The change in this commit moves the channel write on c.derpRecvCh and
subsequent select awaiting the result into the derp.ReceivedMessage
case, preventing it from being reached from any other case. Explicit
continue statements are also added to non-derp.ReceivedPacket cases
where they were missing, in order to signal intent to the reader.
Fixes#10586
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This uses the fact that we've received a frame from a given DERP region
within a certain time as a signal that the region is stil present (and
thus can still be a node's PreferredDERP / home region) even if we don't
get a STUN response from that region during a netcheck.
This should help avoid DERP flaps that occur due to losing STUN probes
while still having a valid and active TCP connection to the DERP server.
RELNOTE=Reduce home DERP flapping when there's still an active connection
Updates #8603
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: If7da6312581e1d434d5c0811697319c621e187a0
* util/linuxfw, wgengine: allow ingress to magicsock UDP port on Linux
Updates #9084.
Currently, we have to tell users to manually open UDP ports on Linux when
certain firewalls (like ufw) are enabled. This change automates the process of
adding and updating those firewall rules as magicsock changes what port it
listens on.
Signed-off-by: Naman Sood <mail@nsood.in>
This will enable the runner to be replaced as a configuration side
effect in a later change.
Updates tailscale/corp#14029
Signed-off-by: James Tucker <james@tailscale.com>
For use in ACL tests, we need a way to check whether a packet is allowed
not just with TCP, but any protocol.
Updates #3561
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
In DERP homeless mode, a DERP home connection is not sought or
maintained and the local node is not reachable.
Updates #3363
Updates tailscale/corp#396
Change-Id: Ibc30488ac2e3cfe4810733b96c2c9f10a51b8331
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This is gated behind the silent disco control knob, which is still in
its infancy. Prior to this change disco pong reception was the only
event that could move trustBestAddrUntil forward, so even though we
weren't heartbeating, we would kick off discovery pings every
trustUDPAddrDuration and mirror to DERP.
Updates #540
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This change exposes SilentDisco as a control knob, and plumbs it down to
magicsock.endpoint. No changes are being made to magicsock.endpoint
disco behavior, yet.
Updates #540
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Co-authored-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This change updates log messaging when cleaning up wireguard only peers.
This change also stops us unnecessarily attempting to clean up disco
pings for wireguard only endpoints.
Updates #7826
Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
TestNewConn now passes as root on Linux. It wasn't closing the BPF
listeners and their goroutines.
The code is still a mess of two Close overlapping code paths, but that
can be refactored later. For now, make the two close paths more similar.
Updates #9945
Change-Id: I8a3cf5fb04d22ba29094243b8e645de293d9ed85
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Prior to an earlier netstack bump this code used a string conversion
path to cover multiple cases of behavior seemingly checking for
unspecified addresses, adding unspecified addresses to v6. The behavior
is now crashy in netstack, as it is enforcing address length in various
areas of the API, one in particular being address removal.
As netstack is now protocol specific, we must not create invalid
protocol addresses - an address is v4 or v6, and the address value
contained inside must match. If a control path attempts to do something
otherwise it is now logged and skipped rather than incorrect addressing
being added.
Fixestailscale/corp#15377
Signed-off-by: James Tucker <james@tailscale.com>
Don't assume Linux lacks UDP_GRO support if it lacks UDP_SEGMENT
support. This mirrors a similar change in wireguard/wireguard-go@177caa7
for consistency sake. We haven't found any issues here, just being
overly paranoid.
Updates #cleanup
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Regression from c15997511d. The callback could be run multiple times
from different endpoints.
Fixes#9801
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This migrates containerboot to reuse the NetfilterRunner used
by tailscaled instead of manipulating iptables rule itself.
This has the added advantage of now working with nftables and
we can potentially drop the `iptables` command from the container
image in the future.
Updates #9310
Co-authored-by: Irbe Krumina <irbe@tailscale.com>
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Just a refactor to consolidate the firewall detection logic in a single
package so that it can be reused in a later commit by containerboot.
Updates #9310
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Record the number of MTU probes sent, the total bytes sent, the number of times
we got a successful return from an MTU probe of a particular size, and the max
MTU recorded.
Updates #311
Signed-off-by: Val <valerie@tailscale.com>
Automatically probe the path MTU to a peer when peer MTU is enabled, but do not
use the MTU information for anything yet.
Updates #311
Signed-off-by: Val <valerie@tailscale.com>
When sending a CLI ping with a specific size, continue to probe all possible UDP
paths to the peer until we find one with a large enough MTU to accommodate the
ping. Record any peer path MTU information we discover (but don't use it for
anything other than CLI pings).
Updates #311
Signed-off-by: Val <valerie@tailscale.com>
Add a field to record the wire MTU of the path to this address to the
addrLatency struct and rename it addrQuality.
Updates #311
Signed-off-by: Val <valerie@tailscale.com>
Then use it in tailcfg which had it duplicated a couple times.
I think we have it a few other places too.
And use slices.Equal in wgengine/router too. (found while looking for callers)
Updates #cleanup
Change-Id: If5350eee9b3ef071882a3db29a305081e4cd9d23
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This reverts commit ee90cd02fd.
The outcome is not identical for empty slices. Cloner really needs
tests!
Updates #9601
Signed-off-by: James Tucker <james@tailscale.com>