From 5c42990c2f, not yet released in a stable build.
Caught by existing tests.
Fixes#5685
Change-Id: Ia76bb328809d9644e8b96910767facf627830600
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Baby steps towards turning off heartbeat pings entirely as per #540.
This doesn't change any current magicsock functionality and requires additional
changes to send/disco paths before the flag can be turned on.
Updates #540
Change-Id: Idc9a72748e74145b068d67e6dd4a4ffe3932efd0
Signed-off-by: Jenny Zhang <jz@tailscale.com>
Signed-off-by: Jenny Zhang <jz@tailscale.com>
Incoming disco packets are now dropped unless they match one of the
current bound ports, or have a zero port*.
The BPF filter passes all packets with a disco header to the raw packet
sockets regardless of destination port (in order to avoid needing to
reconfigure BPF on rebind).
If a BPF enabled node has just rebound, due to restart or rebind, it may
receive and reply to disco ping packets destined for ports other than
those which are presently bound. If the pong is accepted, the pinging
node will now assume that it can send WireGuard traffic to the pinged
port - such traffic will not reach the node as it is not destined for a
bound port.
*The zero port is ignored, if received. This is a speculative defense
and would indicate a problem in the receive path, or the BPF filter.
This condition is allowed to pass as it may enable traffic to flow,
however it will also enable problems with the same symptoms this patch
otherwise fixes.
Fixes#5536
Signed-off-by: James Tucker <james@tailscale.com>
1f959edeb0 introduced a regression for JS
where the initial bind no longer occurred at all for JS.
The condition is moved deeper in the call tree to avoid proliferation of
higher level conditions.
Updates #5537
Signed-off-by: James Tucker <james@tailscale.com>
Both RebindingUDPConns now always exist. the initial bind (which now
just calls rebind) now ensures that bind is called for both, such that
they both at least contain a blockForeverConn. Calling code no longer
needs to assert their state.
Signed-off-by: James Tucker <james@tailscale.com>
This is entirely optional (i.e. failing in this code is non-fatal) and
only enabled on Linux for now. Additionally, this new behaviour can be
disabled by setting the TS_DEBUG_DISABLE_AF_PACKET environment variable.
Updates #3824
Replaces #5474
Co-authored-by: Andrew Dunham <andrew@du.nham.ca>
Signed-off-by: David Anderson <danderson@tailscale.com>
The Start method was removed in 4c27e2fa22, but the comment on NewConn
still mentioned it doesn't do anything until this method is called.
Signed-off-by: Kris Brandow <kris.brandow@gmail.com>
This adds a lighter mechanism for endpoint updates from control.
Change-Id: If169c26becb76d683e9877dc48cfb35f90cc5f24
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
A new package can also later record/report which knobs are checked and
set. It also makes the code cleaner & easier to grep for env knobs.
Change-Id: Id8a123ab7539f1fadbd27e0cbeac79c2e4f09751
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This fixes a deadlock on shutdown.
One goroutine is waiting to send on c.derpRecvCh before unlocking c.mu.
The other goroutine is waiting to lock c.mu before receiving from c.derpRecvCh.
#3736 has a more detailed explanation of the sequence of events.
Fixes#3736
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Turning this on at the beginning of the 1.21.x dev cycle, for 1.22.
Updates #150
Change-Id: I1de567cfe0be3df5227087de196ab88e60c9eb56
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The blockForeverConn was only using its sync.Cond one side. Looks like it
was just forgotten.
Fixes#3671
Change-Id: I4ed0191982cdd0bfd451f133139428a4fa48238c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Bigger changes coming later, but this should improve things a bit in
the meantime.
Rationale:
* 2 minutes -> 45 seconds: 2 minutes was overkill and never considered
phones/battery at the time. It was totally arbitrary. 45 seconds is
also arbitrary but is less than 2 minutes.
* heartbeat from 2 seconds to 3 seconds: in practice this meant two
packets per second (2 pings and 2 pongs every 2 seconds) because the
other side was also pinging us every 2 seconds on their own.
That's just overkill. (see #540 too)
So in the worst case before: when we sent a single packet (say: a DNS
packet), we ended up sending 61 packets over 2 minutes: the 1 DNS
query and then then 60 disco pings (2 minutes / 2 seconds) & received
the same (1 DNS response + 60 pongs). Now it's 15. In 1.22 we plan to
remove this whole timer-based heartbeat mechanism entirely.
The 5 seconds to 6.5 seconds change is just stretching out that
interval so you can still miss two heartbeats (other 3 + 3 seconds
would be greater than 5 seconds). This means that if your peer moves
without telling you, you can have a path out for 6.5 seconds
now instead of 5 seconds before disco finds a new one. That will also
improve in 1.22 when we start doing UDP+DERP at the same time
when confidence starts to go down on a UDP path.
Updates #3363
Change-Id: Ic2314bbdaf42edcdd7103014b775db9cf4facb47
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Treat UDP send EPERM errors as a lost UDP packet, not something super
fatal. That's just the Linux firewall preventing it from going out.
And add a leaf package net/neterror for that (and future) policy that
all three packages can share, with tests.
Updates #3619
Change-Id: Ibdb838c43ee9efe70f4f25f7fc7fdf4607ba9c1d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Only if the source address isn't on the currently active interface or
a ping of the DERP server fails.
Updates #3619
Change-Id: I6bf06503cff4d781f518b437c8744ac29577acc8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We only tracked the transport type (UDP vs DERP), not what they were.
Change-Id: Ia4430c1c53afd4634e2d9893d96751a885d77955
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
It's been a bunch of releases now since the TailscaleIPs slice
replacement was added.
Change-Id: I3bd80e1466b3d9e4a4ac5bedba8b4d3d3e430a03
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
endpoint.discoKey is protected by endpoint.mu.
endpoint.sendDiscoMessage was reading it without holding the lock.
This showed up in a CI failure and is readily reproducible locally.
The fix is in two parts.
First, for Conn.enqueueCallMeMaybe, eliminate the one-line helper method endpoint.sendDiscoMessage; call Conn.sendDiscoMessage directly.
This makes it more natural to read endpoint.discoKey in a context
in which endpoint.mu is already held.
Second, for endpoint.sendDiscoPing, explicitly pass the disco key
as an argument. Again, this makes it easier to read endpoint.discoKey
in a context in which endpoint.mu is already held.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>