Starting with fe68841dc7, some e2e tests
got flaky. Rather than debug them (they're gnarly), just revert to the old
behavior as far as those tests are concerned. The tests were somehow
using magicsock without a private key and expecting it to do ... something.
My goal with fe68841dc7 was to stop log spam
and unnecessary work I saw on the iOS app when when stopping the app.
Instead, only stop doing that work on any transition from
once-had-a-private-key to no-longer-have-a-private-key. That fixes
what I wanted to fix while still making the mysterious e2e tests
happy.
Uses natlab only, because the point of this active discovery test is going to be
that it should get through a lot of obstacles.
Signed-off-by: David Anderson <danderson@tailscale.com>
The deadlock was:
* Conn.Close was called, which acquired c.mu
* Then this goroutine scheduled:
if firstDerp {
startGate = c.derpStarted
go func() {
dc.Connect(ctx)
close(c.derpStarted)
}()
}
* The getRegion hook for that derphttp.Client then ran, which also
tries to acquire c.mu.
This change makes that hook first see if we're already in a closing
state and then it can pretend that region doesn't exist.
wireguard-go uses 3 goroutines per peer (with reasonably large stacks
& buffers).
Rather than tell wireguard-go about all our peers, only tell it about
peers we're actively communicating with. That means we need hooks into
magicsock's packet receiving path and tstun's packet sending path to
lazily create a wireguard peer on demand from the network map.
This frees up lots of memory for iOS (where we have almost nothing
left for larger domains with many users).
We should ideally do this in wireguard-go itself one day, but that'd
be a pretty big change.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Before this patch, the 250ms sleep would not be interrupted by context cancellation,
which would result in the goroutine sometimes lingering in tests (100ms grace period).
Signed-off-by: Dmytro Shynkevych <dmytro@tailscale.com>
Very rarely, cancellation occurs between a successful send on derpRecvCh
and a call to copyBuf on the receiving side.
Without this patch, this situation results in <-copyBuf blocking indefinitely.
Signed-off-by: Dmytro Shynkevych <dmytro@tailscale.com>
Peers advertising a discovery key know how to speak the discovery
protocol and do their own heartbeats to get through NATs and keep NATs
open. No need for the pinger except for with legacy peers.
There's a lot of confusion around what tailscale status shows, so make it better:
show region names, last write time, and put stars around DERP too if active.
Now stars are always present if activity, and always somewhere.
* fix tailscale status for peers using discovery
* as part of that, pull out disco address selection into reusable
and testable discoEndpoint.addrForSendLocked
* truncate ping/pong logged hex txids in half to eliminate noise
* move a bunch of random time constants into named constants
with docs
* track a history of per-endpoint pong replies for future use &
status display
* add "send" and " got" prefix to discovery message logging
immediately before the frame type so it's easier to read than
searching for the "<-" or "->" arrows earlier in the line; but keep
those as the more reasily machine readable part for later.
Updates #483
Update the mapping from ip:port to discokey, so when we retrieve a
packet from the network, we can find the same conn.Endpoint that we
gave to wireguard-go previously, without making it think we've
roamed. (We did, but we're not using its roaming.)
Updates #483
Ping messages now go out somewhat regularly, pong replies are sent,
and pong replies are now partially handled enough to upgrade off DERP
to LAN.
CallMeMaybe packets are sent & received over DERP, but aren't yet
handled. That's next (and regular maintenance timers), and then WAN
should work.
Updates #483
Starting at yesterday's e96f22e560 (convering some UDPAddrs to
IPPorts), Conn.ReceiveIPv4 could return a nil addr, which would make
its way through wireguard-go and blow up later. The DERP read path
wasn't initializing the addr result parameter any more, and wgRecvAddr
wasn't checking it either.
Fixes#515
And while plumbing, a bit of discovery work I'll need: the
endpointOfAddr map to map from validated paths to the discoEndpoint.
Not being populated yet.
Updates #483
This adds a new magicsock endpoint type only used when both sides
support discovery (that is, are advertising a discovery
key). Otherwise the old code is used.
So far the new code only communicates over DERP as proof that the new
code paths are wired up. None of the actually discovery messaging is
implemented yet.
Support for discovery (generating and advertising a key) are still
behind an environment variable for now.
Updates #483
And track known peers.
Doesn't yet do anything with the messages. (nor does it send any yet)
Start of docs on the message format. More will come in subsequent changes.
Updates #483
If there's been 5 minutes of inactivity, stop doing STUN lookups. That
means NAT mappings will expire, but they can resume later when there's
activity again.
We'll do this for all platforms later.
Updates tailscale/corp#320
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The magicsock derpReader was holding onto 65KB for each DERP
connection forever, just in case.
Make the derp{,http}.Client be in charge of memory instead. It can
reuse its bufio.Reader buffer space.
Instead of hard-coding the DERP map (except for cmd/tailscale netcheck
for now), get it from the control server at runtime.
And make the DERP map support multiple nodes per region with clients
picking the first one that's available. (The server will balance the
order presented to clients for load balancing)
This deletes the stunner package, merging it into the netcheck package
instead, to minimize all the config hooks that would've been
required.
Also fix some test flakes & races.
Fixes#387 (Don't hard-code the DERP map)
Updates #388 (Add DERP region support)
Fixes#399 (wgengine: flaky tests)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
If a test calls log.Printf, 'go test' horrifyingly rearranges the
output to no longer be in chronological order, which makes debugging
virtually impossible. Let's stop that from happening by making
log.Printf panic if called from any module, no matter how deep, during
tests.
This required us to change the default error handler in at least one
http.Server, as well as plumbing a bunch of logf functions around,
especially in magicsock and wgengine, but also in logtail and backoff.
To add insult to injury, 'go test' also rearranges the output when a
parent test has multiple sub-tests (all the sub-test's t.Logf is always
printed after all the parent tests t.Logf), so we need to screw around
with a special Logf that can point at the "current" t (current_t.Logf)
in some places. Probably our entire way of using subtests is wrong,
since 'go test' would probably like to run them all in parallel if you
called t.Parallel(), but it definitely can't because the're all
manipulating the shared state created by the parent test. They should
probably all be separate toplevel tests instead, with common
setup/teardown logic. But that's a job for another time.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
The docs on magicsock.Conn stated that they implemented the
wireguard/device.Bind interface, yet this type does not exist. In
reality, the Conn type implements the wireguard/conn.Bind interface.
I also fixed a small typo in the same file.
Signed-off-by: Blake Gentry <blakesgentry@gmail.com>
* remove endpoint discovery noise when results unchanged
* consistently spell derp nodes as "derp-N"
* replace "127.3.3.40:" with "derp-" in CreateEndpoint log output
* stop early DERP setup before SetPrivateKey is called;
it just generates log nosie
* fix stringification of peer ShortStrings (it had an old %x on it,
rendering it garbage)
* describe why derp routes are changing, with one of:
shared home, their home, our home, alt
Add opt-in method to request IPv6 endpoints from the control plane.
For now they should just be skipped. A previous version of this CL was
unconditional and reportedly had problems that I can't reproduce. So
make it a knob until the mystery is solved.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Breaks something deep in wireguard or magicsock's brainstem, no packets at all
can flow. All received packets fail decryption with "invalid mac1".
This reverts commit 94024355ed.
Signed-off-by: David Anderson <dave@natulte.net>
More steps towards IPv6 transport.
We now send it to tailcontrol, which ignores it.
But it doesn't actually actually support IPv6 yet (outside of STUN).
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>