On iOS (and possibly other platforms), sometimes our UDP socket would
get stuck in a state where it was bound to an invalid interface (or no
interface) after a network reconfiguration. We can detect this by
actually checking the error codes from sending our STUN packets.
If we completely fail to send any STUN packets, we know something is
very broken. So on the next STUN attempt, let's rebind the UDP socket
to try to correct any problems.
This fixes a problem where iOS would sometimes get stuck using DERP
instead of direct connections until the backend was restarted.
Fixes#2994
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
Windows has a public dns.Flush used in router_windows.go.
However that won't work for platforms like Linux, where
we need a different flush mechanism for resolved versus
other implementations.
We're instead adding a FlushCaches method to the dns Manager,
which can be made to work on all platforms as needed.
Fixes https://github.com/tailscale/tailscale/issues/2132
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
Spelling out the command to run for every type
means that changing the command makes for a large, repetitive diff.
Stop doing that.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
We currently plumb full URLs for DNS resolvers from the control server
down to the client. But when we pass the values into the net/dns
package, we throw away any URL that isn't a bare IP. This commit
continues the plumbing, and gets the URL all the way to the built in
forwarder. (It stops before plumbing URLs into the OS configurations
that can handle them.)
For #2596
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
And in the process, fix the related confusing error messages from
pinging your own IP or hostname.
Fixes#2803
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
AFAICT this was always present, the log read mid-execution was never safe.
But it seems like the recent magicsock refactoring made the race much
more likely.
Signed-off-by: David Anderson <danderson@tailscale.com>
And add health check errors to ipnstate.Status (tailscale status --json).
Updates #2746
Updates #2775
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
It was useful early in development when disco clients were the
exception and tailscale logs were noisier than today, but now
non-disco is the exception.
Updates #2752
Signed-off-by: David Anderson <danderson@tailscale.com>
Having removed magicconn.Start, there's no need to synchronize startup
of other things to it any more.
Signed-off-by: David Anderson <danderson@tailscale.com>
Over time, other magicsock refactors have made Start effectively a
no-op, except that some other functions choose to panic if called
before Start.
Signed-off-by: David Anderson <danderson@tailscale.com>
We were returning an error almost, but not quite like errConnClosed in
a single codepath, which could still trip the panic on reconfig in the
test logic.
Signed-off-by: David Anderson <danderson@tailscale.com>
Our prod code doesn't eagerly handshake, because our disco layer enables
on-demand handshaking. Configuring both peers to eagerly handshake leads
to WireGuard handshake races that make TestTwoDevicePing flaky.
Signed-off-by: David Anderson <danderson@tailscale.com>
It only existed to override one test-only behavior with a
different test-only behavior, in both cases working around
an annoying feature of our CI environments. Instead, handle
that weirdness entirely in the test code, with a tweaked
TestOnlyPacketListener that gets injected.
Signed-off-by: David Anderson <danderson@tailscale.com>
The docstring said it was meant for use in tests, but it's specifically a
special codepath that is _only_ used in tests, so make the claim stronger.
Signed-off-by: David Anderson <danderson@tailscale.com>
Instead of using the legacy codepath, teach discoEndpoint to handle
peers that have a home DERP, but no disco key. We can still communicate
with them, but only over DERP.
Signed-off-by: David Anderson <danderson@tailscale.com>
This log is quite verbose, it was only to be left in for one
unstable build to help debug a user issue.
This reverts commit 1dd2552032.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
Intended to help in resolving customer issue with
DNS caching.
We currently exec `ipconfig /flushdns` from two
places:
- SetDNS(), which logs before invoking
- here in router_windows, which doesn't
We'd like to see a positive indication in logs that flushdns
is being run.
As this log is expected to be spammy, it is proposed to
leave this in just long enough to do an unstable 1.13.x build
and then revert it. They won't run an unsigned image that
I build.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
The number of peers we have will be pretty stable across time.
Allocate roughly the right slice size.
This reduces memory usage when there are many peers.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Two optimizations.
Use values instead of pointers.
We were using pointers to make track the "peer in progress" easier.
It's not too hard to do it manually, though.
Make two passes through the data, so that we can size our
return value accurately from the beginning.
This is cheap enough compared to the allocation,
which grows linearly in the number of peers,
that it is worth doing.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>