This adds a lighter mechanism for endpoint updates from control.
Change-Id: If169c26becb76d683e9877dc48cfb35f90cc5f24
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The version string changed slightly. Adapt.
And always check the current Go version to prevent future
accidental regressions. I would have missed this one had
I not explicitly manually checked it.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
A new package can also later record/report which knobs are checked and
set. It also makes the code cleaner & easier to grep for env knobs.
Change-Id: Id8a123ab7539f1fadbd27e0cbeac79c2e4f09751
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This fixes a deadlock on shutdown.
One goroutine is waiting to send on c.derpRecvCh before unlocking c.mu.
The other goroutine is waiting to lock c.mu before receiving from c.derpRecvCh.
#3736 has a more detailed explanation of the sequence of events.
Fixes#3736
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Turning this on at the beginning of the 1.21.x dev cycle, for 1.22.
Updates #150
Change-Id: I1de567cfe0be3df5227087de196ab88e60c9eb56
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The blockForeverConn was only using its sync.Cond one side. Looks like it
was just forgotten.
Fixes#3671
Change-Id: I4ed0191982cdd0bfd451f133139428a4fa48238c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Bigger changes coming later, but this should improve things a bit in
the meantime.
Rationale:
* 2 minutes -> 45 seconds: 2 minutes was overkill and never considered
phones/battery at the time. It was totally arbitrary. 45 seconds is
also arbitrary but is less than 2 minutes.
* heartbeat from 2 seconds to 3 seconds: in practice this meant two
packets per second (2 pings and 2 pongs every 2 seconds) because the
other side was also pinging us every 2 seconds on their own.
That's just overkill. (see #540 too)
So in the worst case before: when we sent a single packet (say: a DNS
packet), we ended up sending 61 packets over 2 minutes: the 1 DNS
query and then then 60 disco pings (2 minutes / 2 seconds) & received
the same (1 DNS response + 60 pongs). Now it's 15. In 1.22 we plan to
remove this whole timer-based heartbeat mechanism entirely.
The 5 seconds to 6.5 seconds change is just stretching out that
interval so you can still miss two heartbeats (other 3 + 3 seconds
would be greater than 5 seconds). This means that if your peer moves
without telling you, you can have a path out for 6.5 seconds
now instead of 5 seconds before disco finds a new one. That will also
improve in 1.22 when we start doing UDP+DERP at the same time
when confidence starts to go down on a UDP path.
Updates #3363
Change-Id: Ic2314bbdaf42edcdd7103014b775db9cf4facb47
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Treat UDP send EPERM errors as a lost UDP packet, not something super
fatal. That's just the Linux firewall preventing it from going out.
And add a leaf package net/neterror for that (and future) policy that
all three packages can share, with tests.
Updates #3619
Change-Id: Ibdb838c43ee9efe70f4f25f7fc7fdf4607ba9c1d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Only if the source address isn't on the currently active interface or
a ping of the DERP server fails.
Updates #3619
Change-Id: I6bf06503cff4d781f518b437c8744ac29577acc8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We only tracked the transport type (UDP vs DERP), not what they were.
Change-Id: Ia4430c1c53afd4634e2d9893d96751a885d77955
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
And it updates the build tag style on a couple files.
Change-Id: I84478d822c8de3f84b56fa1176c99d2ea5083237
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
It's been a bunch of releases now since the TailscaleIPs slice
replacement was added.
Change-Id: I3bd80e1466b3d9e4a4ac5bedba8b4d3d3e430a03
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
endpoint.discoKey is protected by endpoint.mu.
endpoint.sendDiscoMessage was reading it without holding the lock.
This showed up in a CI failure and is readily reproducible locally.
The fix is in two parts.
First, for Conn.enqueueCallMeMaybe, eliminate the one-line helper method endpoint.sendDiscoMessage; call Conn.sendDiscoMessage directly.
This makes it more natural to read endpoint.discoKey in a context
in which endpoint.mu is already held.
Second, for endpoint.sendDiscoPing, explicitly pass the disco key
as an argument. Again, this makes it easier to read endpoint.discoKey
in a context in which endpoint.mu is already held.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
I believe that this should eliminate the flakiness.
If GitHub CI manages to be even slower that can be believed
(and I can believe a lot at this point),
then we should roll this back and make some more invasive changes.
Updates #654Fixes#3247 (I hope)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
We can do the "maybe delete" check unilaterally:
In the case of an insert, both oldDiscoKey
and ep.discoKey will be the zero value.
And since we don't use pi again, we can skip
giving it a name, which makes scoping clearer.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
wgengine/wgcfg: introduce wgcfg.NewDevice helper to disable roaming
at all call sites (one real plus several tests).
Fixestailscale/corp#3016.
Signed-off-by: David Anderson <danderson@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
And annotate magicsock as a start.
And add localapi and debug handlers with the Prometheus-format
exporter.
Updates #3307
Change-Id: I47c5d535fe54424741df143d052760387248f8d3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>