If we get an non-disco presumably-wireguard-encrypted UDP packet from
an IP:port we don't recognize, rather than drop the packet, give it to
WireGuard anyway and let WireGuard try to figure out who it's from and
tell us.
This uses the new hook added in https://github.com/tailscale/wireguard-go/pull/27
Updates tailscale/corp#20732
Change-Id: I5c61a40143810592f9efac6c12808a87f924ecf2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Load Balancers often have more than one ingress IP, so allowing us to
add multiple means we can offer multiple options.
Updates #12578
Change-Id: I4aa49a698d457627d2f7011796d665c67d4c7952
Signed-off-by: Lee Briggs <lee@leebriggs.co.uk>
For testing. Lee wants to play with 'AWS Global Accelerator Custom
Routing with Amazon Elastic Kubernetes Service'. If this works well
enough, we can promote it.
Updates #12578
Change-Id: I5018347ed46c15c9709910717d27305d0aedf8f4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The DERP Return Path Optimization (DRPO) is over four years old (and
on by default for over two) and we haven't had problems, so time to
remove the emergency shutoff code (controlknob) which we've never
used. The controlknobs are only meant for new features, to mitigate
risk. But we don't want to keep them forever, as they kinda pollute
the code.
Updates #150
Change-Id: If021bc8fd1b51006d8bddd1ffab639bb1abb0ad1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
And fix up a bogus comment and flesh out some other comments.
Updates #cleanup
Change-Id: Ia60a1c04b0f5e44e8d9587914af819df8e8f442a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The logic we added in #11378 would prevent selecting a home DERP if we
have no control connection.
Updates tailscale/corp#18095
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I44bb6ac4393989444e4961b8cfa27dc149a33c6e
Palo Alto reported interpreting hairpin probes as LAND attacks, and the
firewalls may be responding to this by shutting down otherwise in use NAT sessions
prematurely. We don't currently make use of the outcome of the hairpin
probes, and they contribute to other user confusion with e.g. the
AirPort Extreme hairpin session workaround. We decided in response to
remove the whole probe feature as a result.
Updates #188
Updates tailscale/corp#19106
Updates tailscale/corp#19116
Signed-off-by: James Tucker <james@tailscale.com>
It was requested by the first customer 4-5 years ago and only used
for a brief moment of time. We later added netmap visibility trimming
which removes the need for this.
It's been hidden by the CLI for quite some time and never documented
anywhere else.
This keeps the CLI flag, though, out of caution. It just returns an
error if it's set to anything but true (its default).
Fixes#12058
Change-Id: I7514ba572e7b82519b04ed603ff9f3bdbaecfda7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Palo Alto firewalls have a typically hard NAT, but also have a mode
called Persistent DIPP that is supposed to provide consistent port
mapping suitable for STUN resolution of public ports. Persistent DIPP
works initially on most Palo Alto firewalls, but some models/software
versions have a bug which this works around.
The bug symptom presents as follows:
- STUN sessions resolve a consistent public IP:port to start with
- Much later netchecks report the same IP:Port for a subset of
sessions, most often the users active DERP, and/or the port related
to sustained traffic.
- The broader set of DERPs in a full netcheck will now consistently
observe a new IP:Port.
- After this point of observation, new inbound connections will only
succeed to the new IP:Port observed, and existing/old sessions will
only work to the old binding.
In this patch we now advertise the lowest latency global endpoint
discovered as we always have, but in addition any global endpoints that
are observed more than once in a single netcheck report. This should
provide viable endpoints for potential connection establishment across
a NAT with this behavior.
Updates tailscale/corp#19106
Signed-off-by: James Tucker <james@tailscale.com>
In prep for most of the package funcs in net/interfaces to become
methods in a long-lived netmon.Monitor that can cache things. (Many
of the funcs are very heavy to call regularly, whereas the long-lived
netmon.Monitor can subscribe to things from the OS and remember
answers to questions it's asked regularly later)
Updates tailscale/corp#10910
Updates tailscale/corp#18960
Updates #7967
Updates #3299
Change-Id: Ie4e8dedb70136af2d611b990b865a822cd1797e5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This has been a TODO for ages. Time to do it.
The goal is to move more network state accessors to netmon.Monitor
where they can be cheaper/cached.
Updates tailscale/corp#10910
Updates tailscale/corp#18960
Updates #7967
Updates #3299
Change-Id: I60fc6508cd2d8d079260bda371fc08b6318bcaf1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This adds a health.Tracker to tsd.System, accessible via
a new tsd.System.HealthTracker method.
In the future, that new method will return a tsd.System-specific
HealthTracker, so multiple tsnet.Servers in the same process are
isolated. For now, though, it just always returns the temporary
health.Global value. That permits incremental plumbing over a number
of changes. When the second to last health.Global reference is gone,
then the tsd.System.HealthTracker implementation can return a private
Tracker.
The primary plumbing this does is adding it to LocalBackend and its
dozen and change health calls. A few misc other callers are also
plumbed. Subsequent changes will flesh out other parts of the tree
(magicsock, controlclient, etc).
Updates #11874
Updates #4136
Change-Id: Id51e73cfc8a39110425b6dc19d18b3975eac75ce
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This moves most of the health package global variables to a new
`health.Tracker` type.
But then rather than plumbing the Tracker in tsd.System everywhere,
this only goes halfway and makes one new global Tracker
(`health.Global`) that all the existing callers now use.
A future change will eliminate that global.
Updates #11874
Updates #4136
Change-Id: I6ee27e0b2e35f68cb38fecdb3b2dc4c3f2e09d68
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Most of the magicsock tests fake the network, simulating packets going
out and coming in. There's no reason to actually hit your router to do
UPnP/NAT-PMP/PCP during in tests. But while debugging thousands of
iterations of tests to deflake some things, I saw it slamming my
router. This stops that.
Updates #11762
Change-Id: I59b9f48f8f5aff1fa16b4935753d786342e87744
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Seems to deflake tstest/integration tests. I can't reproduce it
anymore on one of my VMs that was consistently flaking after a dozen
runs before. Now I can run hundreds of times.
Updates #11649Fixes#7036
Change-Id: I2f7d4ae97500d507bdd78af9e92cd1242e8e44b8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We have seen in macOS client logs that the "operation not permitted", a
syscall.EPERM error, is being returned when traffic is attempted to be
sent. This may be caused by security software on the client.
This change will perform a rebind and restun if we receive a
syscall.EPERM error on clients running darwin. Rebinds will only be
called if we haven't performed one specifically for an EPERM error in
the past 5 seconds.
Updates #11710
Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
Just because we don't have known endpoints for a peer does not mean that
the peer should become unreachable. If we know the peers key, it should
be able to call us, then we can talk back via whatever path it called us
on. First step - don't drop the packet in this context.
Updates tailscale/corp#19106
Signed-off-by: James Tucker <james@tailscale.com>
The netcheck package and the magicksock package coordinate via the
health package, but both sides have time based heuristics through
indirect dependencies. These were misaligned, so the implemented
heuristic aimed at reducing DERP moves while there is active traffic
were non-operational about 3/5ths of the time.
It is problematic to setup a good test for this integration presently,
so instead I added comment breadcrumbs along with the initial fix.
Updates #8603
Signed-off-by: James Tucker <james@tailscale.com>
This pretty much always results in an outage because peers won't
discover our new home region and thus won't be able to establish
connectivity.
Updates tailscale/corp#18095
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ic0d09133f198b528dd40c6383b16d7663d9d37a7
Since link-local addresses are definitionally more likely to be a direct
(lower-latency, more reliable) connection than a non-link-local private
address, give those a bit of a boost when selecting endpoints.
Updates #8097
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I93fdeb07de55ba39ba5fcee0834b579ca05c2a4e
This adds a method to wgengine.Engine and plumbed down into magicsock
to add a way to get a type-safe Tailscale-safe wrapper around a
wireguard-go device.Peer that only exposes methods that are safe for
Tailscale to use internally.
It also removes HandshakeAttempts from PeerStatusLite that was just
added as it wasn't needed yet and is now accessible ala cart as needed
from the Peer type accessor.
None of this is used yet.
Updates #7617
Change-Id: I07be0c4e6679883e6eeddf8dbed7394c9e79c5f4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This commit implements probing of UDP path lifetime on the tail end of
an active direct connection. Probing configuration has two parts -
Cliffs, which are various timeout cliffs of interest, and
CycleCanStartEvery, which limits how often a probing cycle can start,
per-endpoint. Initially a statically defined default configuration will
be used. The default configuration has cliffs of 10s, 30s, and 60s,
with a CycleCanStartEvery of 24h. Probing results are communicated via
clientmetric counters. Probing is off by default, and can be enabled
via control knob. Probing is purely informational and does not yet
drive any magicsock behaviors.
Updates #540
Signed-off-by: Jordan Whited <jordan@tailscale.com>
The switch in Conn.runDerpReader() on the derp.ReceivedMessage type
contained cases other than derp.ReceivedPacket that fell through to
writing to c.derpRecvCh, which should only be reached for
derp.ReceivedPacket. This can result in the last/previous
derp.ReceivedPacket to be re-handled, effectively creating a duplicate
packet. If the last derp.ReceivedPacket happens to be a
disco.CallMeMaybe it may result in a disco ping scan towards the
originating peer on the endpoints contained.
The change in this commit moves the channel write on c.derpRecvCh and
subsequent select awaiting the result into the derp.ReceivedMessage
case, preventing it from being reached from any other case. Explicit
continue statements are also added to non-derp.ReceivedPacket cases
where they were missing, in order to signal intent to the reader.
Fixes#10586
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This uses the fact that we've received a frame from a given DERP region
within a certain time as a signal that the region is stil present (and
thus can still be a node's PreferredDERP / home region) even if we don't
get a STUN response from that region during a netcheck.
This should help avoid DERP flaps that occur due to losing STUN probes
while still having a valid and active TCP connection to the DERP server.
RELNOTE=Reduce home DERP flapping when there's still an active connection
Updates #8603
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: If7da6312581e1d434d5c0811697319c621e187a0
* util/linuxfw, wgengine: allow ingress to magicsock UDP port on Linux
Updates #9084.
Currently, we have to tell users to manually open UDP ports on Linux when
certain firewalls (like ufw) are enabled. This change automates the process of
adding and updating those firewall rules as magicsock changes what port it
listens on.
Signed-off-by: Naman Sood <mail@nsood.in>
In DERP homeless mode, a DERP home connection is not sought or
maintained and the local node is not reachable.
Updates #3363
Updates tailscale/corp#396
Change-Id: Ibc30488ac2e3cfe4810733b96c2c9f10a51b8331
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This is gated behind the silent disco control knob, which is still in
its infancy. Prior to this change disco pong reception was the only
event that could move trustBestAddrUntil forward, so even though we
weren't heartbeating, we would kick off discovery pings every
trustUDPAddrDuration and mirror to DERP.
Updates #540
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This change exposes SilentDisco as a control knob, and plumbs it down to
magicsock.endpoint. No changes are being made to magicsock.endpoint
disco behavior, yet.
Updates #540
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Co-authored-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This change updates log messaging when cleaning up wireguard only peers.
This change also stops us unnecessarily attempting to clean up disco
pings for wireguard only endpoints.
Updates #7826
Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
TestNewConn now passes as root on Linux. It wasn't closing the BPF
listeners and their goroutines.
The code is still a mess of two Close overlapping code paths, but that
can be refactored later. For now, make the two close paths more similar.
Updates #9945
Change-Id: I8a3cf5fb04d22ba29094243b8e645de293d9ed85
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Don't assume Linux lacks UDP_GRO support if it lacks UDP_SEGMENT
support. This mirrors a similar change in wireguard/wireguard-go@177caa7
for consistency sake. We haven't found any issues here, just being
overly paranoid.
Updates #cleanup
Signed-off-by: Jordan Whited <jordan@tailscale.com>