Commit Graph

131 Commits

Author SHA1 Message Date
Brad Fitzpatrick
221e7d7767 wgengine/magicsock: make log message include DERP port (node) 2020-03-20 13:51:20 -07:00
Brad Fitzpatrick
33bdcabf03 wgengine/magicsock: call stun callback w/ only valid part of STUN packet 2020-03-20 13:44:27 -07:00
David Anderson
0be475ba46 Revert "tailcfg, controlclient, magicsock: request IPv6 endpoints, but ignore them"
Breaks something deep in wireguard or magicsock's brainstem, no packets at all
can flow. All received packets fail decryption with "invalid mac1".

This reverts commit 94024355ed.

Signed-off-by: David Anderson <dave@natulte.net>
2020-03-20 03:26:17 -07:00
Brad Fitzpatrick
94024355ed tailcfg, controlclient, magicsock: request IPv6 endpoints, but ignore them
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2020-03-19 21:01:52 -07:00
Brad Fitzpatrick
60ea635c6d wgengine/magicsock: delete inaccurate comment
I meant to include this in the earlier commit.
2020-03-19 19:48:02 -07:00
Brad Fitzpatrick
a184e05290 wgengine/magicsock: listen on udp6, use it for STUN, report endpoint
More steps towards IPv6 transport.

We now send it to tailcontrol, which ignores it.

But it doesn't actually actually support IPv6 yet (outside of STUN).

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2020-03-19 13:54:38 -07:00
Brad Fitzpatrick
7caa288213 wgengine/magicsock: rename pconn field to pconn4, in prep for pconn6 2020-03-19 08:49:30 -07:00
David Crawshaw
addbdce296 wgengine, ipn: include number of active DERPs in status
Use this when making the ipn state transition from Starting to
Running. This way a network of quiet nodes with no active
handshaking will still transition to Active.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2020-03-19 17:55:16 +11:00
David Crawshaw
1ad78ce698 magicsock: reconnect to home DERP on key change
Typically the home DERP server is found and set on startup before
magicsock's SetPrivateKey can be called, so no DERP connection is
established. Make sure one is by kicking the home DERP tires in
SetPrivateKey.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2020-03-19 17:53:44 +11:00
David Crawshaw
455ba751d9 magicsock: start connection to HOME derp immediately
The code as written intended to do this, but it repeated the
comparison of derpNum and c.myDerp after c.myDerp had been
updated, so it never executed.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2020-03-19 17:36:30 +11:00
David Anderson
315a5e5355 scripts: add a license header checker.
Signed-off-by: David Anderson <dave@natulte.net>
2020-03-17 21:34:44 -07:00
Brad Fitzpatrick
e085aec8ef all: update to wireguard-go API changes
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2020-03-17 08:53:05 -07:00
Brad Fitzpatrick
db2436c7ff wgengine/magicsock: don't interrupt endpoint updates, merge all mutex into one
Before, endpoint updates were constantly being interrupted and resumed
on Linux due to tons of LinkChange messages from over-zealous Linux
netlink messages (from router_linux.go)

Now that endpoint updates are fast and bounded in time anyway, just
let them run to completion, but note that another needs to be
scheduled after.

Now logs went from pages of noise to just:

root@taildoc:~# grep -i -E 'stun|endpoint update' log
2020/03/13 08:51:29 magicsock.Conn: starting endpoint update (initial)
2020/03/13 08:51:30 magicsock.Conn.ReSTUN: endpoint update active, need another later ("link-change-minor")
2020/03/13 08:51:31 magicsock.Conn: starting endpoint update (link-change-minor)
2020/03/13 08:51:31 magicsock.Conn.ReSTUN: endpoint update active, need another later ("link-change-minor")
2020/03/13 08:51:33 magicsock.Conn: starting endpoint update (link-change-minor)
2020/03/13 08:51:33 magicsock.Conn.ReSTUN: endpoint update active, need another later ("link-change-minor")
2020/03/13 08:51:35 magicsock.Conn: starting endpoint update (link-change-minor)
2020/03/13 08:51:35 magicsock.Conn.ReSTUN: endpoint update active, need another later ("link-change-minor")

Or, seen in another run:

2020/03/13 08:45:41 magicsock.Conn: starting endpoint update (periodic)
2020/03/13 08:46:09 magicsock.Conn: starting endpoint update (periodic)
2020/03/13 08:46:21 magicsock.Conn: starting endpoint update (link-change-major)
2020/03/13 08:46:37 magicsock.Conn: starting endpoint update (periodic)
2020/03/13 08:47:05 magicsock.Conn: starting endpoint update (periodic)

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2020-03-13 09:34:11 -07:00
Brad Fitzpatrick
db31550854 wgengine: don't Reconfig on boring link changes
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2020-03-13 07:45:59 -07:00
Brad Fitzpatrick
b9c6d3ceb8 netcheck: work behind UDP-blocked networks again, add tests
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2020-03-12 14:49:06 -07:00
Brad Fitzpatrick
bc73dcf204 wgengine/magicsock: don't block in Send waiting for derphttp.Send
Fixes #137
Updates #109
Updates #162
Updates #163

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2020-03-12 12:19:12 -07:00
Brad Fitzpatrick
8807913be9 wgengine/magicsock: wait for previous DERP goroutines to end before new ones
Updates #109 (hopefully fixes, will wait for graphs to be happy)

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2020-03-12 12:19:12 -07:00
Brad Fitzpatrick
eff6dcdb4e wgengine/magicsock: log more about why we're re-STUNing 2020-03-12 12:09:25 -07:00
Brad Fitzpatrick
b3ddf51a15 wgengine/magicsock: add a pointer value for logging
Updates #109

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2020-03-11 15:12:19 -07:00
Brad Fitzpatrick
b0f8931d26 wgengine/magicsock: make a test signature a bit more explicit 2020-03-11 09:51:33 -07:00
David Crawshaw
7ec54e0064 wgengine/magicsock: remove TODO
The TODO above derphttp.NewClient suggests it does network I/O,
but the derphttp client connects lazily and so creating one is
very cheap.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2020-03-11 12:17:37 -04:00
Brad Fitzpatrick
01b4bec33f stunner: re-do how Stunner works
It used to make assumptions based on having Anycast IPs that are super
near. Now we're intentionally going to a bunch of different distant
IPs to measure latency.

Also, optimize how the hairpin detection works. No need to STUN on
that socket. Just use that separate socket for sending, once we know
the other UDP4 socket's endpoint. The trick is: make our test probe
also a STUN packet, so it fits through magicsock's existing STUN
routing.

This drops netcheck from ~5 seconds to ~250-500ms.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2020-03-11 08:08:48 -07:00
David Anderson
77af7e5436 wgengine/magicsock: mark test logfunc as a helper.
Signed-off-by: David Anderson <danderson@tailscale.com>
2020-03-10 18:00:37 -07:00
David Anderson
7eda3af034 wgengine/magicsock: clean up derp http servers on shutdown.
Failure to do this leads to fd exhaustion at -count=10000,
and increasingly poor execution north of -count=100.

Signed-off-by: David Anderson <danderson@tailscale.com>
2020-03-10 18:00:37 -07:00
David Anderson
d651715528 wgengine/magicsock: synchronize test STUN shutdown.
Failure to do so triggers either a data race or a panic
in the testing package, due to racey use of t.Logf.

Signed-off-by: David Anderson <danderson@tailscale.com>
2020-03-10 18:00:37 -07:00
David Anderson
86baf60bd4 wgengine/magicsock: synchronize epUpdate cleanup on shutdown.
Signed-off-by: David Anderson <danderson@tailscale.com>
2020-03-10 18:00:37 -07:00
Brad Fitzpatrick
023df9239e Move linkstate boring change filtering to magicsock
So we can at least re-STUN on boring updates.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2020-03-10 12:50:03 -07:00
David Anderson
592fec7606 wgengine/magicsock: move device close to uncursed portion of test.
Device close used to suffer from deadlocks, but no longer.

Signed-off-by: David Anderson <danderson@tailscale.com>
2020-03-10 11:57:57 -07:00
Brad Fitzpatrick
a265d7cbff wgengine/magicsock: in STUN-disabled test mode, let endpoint discovery proceed 2020-03-10 11:35:43 -07:00
Brad Fitzpatrick
5c1e443d34 wgengine/monitor: don't call LinkChange when interfaces look unchanged
Basically, don't trust the OS-level link monitor to only tell you
interesting things. Sanity check it.

Also, move the interfaces package into the net directory now that we
have it.
2020-03-10 11:03:19 -07:00
Brad Fitzpatrick
39c0ae1dba derp/derpmap: new DERP config package, merge netcheck into magicsock more
Fixes #153
Updates #162
Updates #163

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2020-03-10 10:37:25 -07:00
Brad Fitzpatrick
4800926006 wgengine/magicsock: add AddrSet appendDests+UpdateDst tests 2020-03-09 09:13:28 -07:00
David Crawshaw
e201f63230 magicsock: unskip tests that are reliable
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2020-03-08 09:29:37 -04:00
David Crawshaw
0f73070a57 wgengine: shut down wireguard on Close
This was (presumably) missing from wgengine because the
interactions between magicsock and wireguard-go meant that the
shutdown never worked. Now those are fixed, actually shut down.

Fixes occasional flake in expanded ipn/e2e_test.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2020-03-08 09:03:27 -04:00
David Crawshaw
ce7f6b2df1 wgengine: have pinger use all single-IP routes
Fixes #139

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2020-03-08 07:09:38 -04:00
Martin Baillie
8ae3ba0cf5 wgengine: define default tunname for each platform
The device name "tailscale0" will be used for all platforms except for
OpenBSD where "tun" is enforced by the kernel. `CreateTUN()` in
`wireguard-go` will select the next available "tunX" device name on the
OpenBSD system.

Signed-off-by: Martin Baillie <martin@baillie.email>
2020-03-07 21:40:01 -08:00
David Anderson
bb93d7aaba wgengine/magicsock: plumb logf throughout, and expose in Options.
Signed-off-by: David Anderson <danderson@tailscale.com>
2020-03-07 14:11:28 -08:00
Brad Fitzpatrick
f42b9b6c9a wgengine/magicsock: don't discard UDP packet on UDP+DERP race
Fixes #155

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2020-03-07 14:09:06 -08:00
David Anderson
e3172ae267 wgengine/magicsock: uncurse TestDeviceStartStop, let CI run it.
Signed-off-by: David Anderson <danderson@tailscale.com>
2020-03-06 20:43:57 -08:00
David Anderson
f265603110 wgengine/magicsock: fix data race in ReceiveIPv4.
The UDP reader goroutine was clobbering `n` and `err` from the
main goroutine, whose accesses are not synchronized the way `b` is.

Signed-off-by: David Anderson <danderson@tailscale.com>
2020-03-06 20:41:15 -08:00
David Anderson
77354d4617 wgengine/magicsock: unblock wireguard-go's read on magicsock shutdown.
wireguard-go closes magicsock, and expects this to unblock reads
so that its internal goroutines can wind down. We were incorrectly
blocking the read indefinitey and breaking this contract.

Signed-off-by: David Anderson <danderson@tailscale.com>
2020-03-06 18:28:47 -08:00
David Anderson
fdee5fb639 wgengine/magicsock: don't mutexly reach inside Conn to tweak DERP settings.
Signed-off-by: David Anderson <danderson@tailscale.com>
2020-03-06 18:28:47 -08:00
David Anderson
643bf14653 wgengine/magicsock: disable the new ping test.
It's extremely flaky in several dimensions, as well as very slow.
It's making the CI completely red all the time without telling us
useful information.

Set RUN_CURSED_TESTS=1 to run locally.
2020-03-06 13:35:59 -08:00
David Anderson
c8ebac2def wgengine/magicsock: try deflaking again.
This change just alters the semantics of the one flaky test, without
trying to speed up timeouts on the others. Empirically, speeding up
the timeouts causes _more_ flakes right now :(
2020-03-06 12:43:49 -08:00
David Anderson
cd1ac63b4c Revert "wgengine/magicsock: temporarily deflake."
This reverts commit c5835c6ced.
2020-03-06 12:37:19 -08:00
David Anderson
c5835c6ced wgengine/magicsock: temporarily deflake.
The remaining flake occurs due to a mysterious packet loss. This
doesn't affect normal tailscaled operations, so until I track down
where the loss occurs and fix it, the flaky test is going to be
lenient about packet loss (but not about whether the spray logic
worked).

Signed-off-by: David Anderson <danderson@tailscale.com>
2020-03-06 12:14:54 -08:00
Brad Fitzpatrick
61d83f759b wgengine/magicsock: remove redundant derpMagicIP comparison
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2020-03-06 11:31:39 -08:00
David Anderson
bd60a750e8 wgengine/magicsock: fix packet spraying test to (mostly) pass.
It previously passed incorrectly due to bugs. With those fixed,
it becomes flaky for 2 reasons. One of them is the wireguard handshake
race, which can eat the 1st sprayed packet and prevent roamAddr
discovery. This change fixes that failure, by spreading the test
traffic out enough that additional spraying occurs.

Signed-Off-By: David Anderson <danderson@tailscale.com>
2020-03-06 11:10:13 -08:00
David Anderson
ef31dd7bb5 wgengine/magicsock: check all 3 fast paths independently.
The previous code would skip the DERP short-circuit if roamAddr
was set, which is not what we wanted. More generally, hitting
any of the fast path conditions is a direct return, so we can
just have 3 standalone branches rather than 'else if' stuff.

Signed-Off-By: David Anderson <danderson@tailscale.com>
2020-03-06 11:10:13 -08:00
David Anderson
05a52746a4 wgengine/magicsock: fix destination selection logic to work with DERP.
The effect is subtle: when we're not spraying packets, and have not yet
figured out a curAddr, and we're not spraying, we end up sending to
whatever the first IP is in the iteration order. In English, that
means "when we have no idea where to send packets, and we've given
up on sending to everyone, just send to the first addr we see in
the list."

This is, in general, what we want, because the addrs are in sorted
preference order, low to high, and DERP is the least preferred
destination. So, when we have no idea where to send, send to DERP,
right?

... Except for very historical reasons, appendDests iterated through
addresses in _reverse_ order, most preferred to least preferred.
crawshaw@ believes this was part of the earliest handshaking
algorithm magicsock had, where it slowly iterated through possible
destinations and poked handshakes to them one at a time.

Anyway, because of this historical reverse iteration, in the case
described above of "we have no idea where to send", the code would
end up sending to the _most_ preferred candidate address, rather
than the _least_ preferred. So when in doubt, we'd end up firing
packets into the blackhole of some LAN address that doesn't work,
and connectivity would not work.

This case only comes up if all your non-DERP connectivity options
have failed, so we more or less failed to detect it because we
didn't have a pathological test box deployed. Worse, codependent
bug 2839854994 made DERP accidentally
work sometimes anyway by incorrectly exploiting roamAddr behavior,
albeit at the cost of making DERP traffic symmetric. In fixing
DERP to once again be asymmetric, we effectively removed the
bandaid that was concealing this bug.

Signed-Off-By: David Anderson <danderson@tailscale.com>
2020-03-06 11:10:13 -08:00