Commit Graph

360 Commits

Author SHA1 Message Date
Josh Bleecher Snyder
a8f61969b9 wgengine/magicsock: remove context arg from listenPacket
It was set to context.Background by all callers, for the same reasons.
Set it locally instead, to simplify call sites.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 10:39:28 -07:00
Josh Bleecher Snyder
744de615f1 health, wgenegine: fix receive func health checks for the fourth time
The old implementation knew too much about how wireguard-go worked.
As a result, it missed genuine problems that occurred due to unrelated bugs.

This fourth attempt to fix the health checks takes a black box approach.
A receive func is healthy if one (or both) of these conditions holds:

* It is currently running and blocked.
* It has been executed recently.

The second condition is required because receive functions
are not continuously executing. wireguard-go calls them and then
processes their results before calling them again.

There is a theoretical false positive if wireguard-go go takes
longer than one minute to process the results of a receive func execution.
If that happens, we have other problems.

Updates #1790

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-26 17:35:49 -07:00
Josh Bleecher Snyder
0d4c8cb2e1 health: delete ReceiveFunc health checks
They were not doing their job.
They need yet another conceptual re-think.
Start by clearing the decks.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-26 17:35:49 -07:00
Josh Bleecher Snyder
8d7f7fc7ce health, wgenegine: fix receive func health checks yet again
The existing implementation was completely, embarrassingly conceptually broken.

We aren't able to see whether wireguard-go's receive function goroutines
are running or not. All we can do is model that based on what we have done.
This commit fixes that model.

Fixes #1781

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-23 08:42:04 -07:00
Josh Bleecher Snyder
5835a3f553 health, wgengine/magicsock: avoid receive function false positives
Avery reported a sub-ms health transition from "receiveIPv4 not running" to "ok".

To avoid these transient false-positives, be more precise about
the expected lifetime of receive funcs. The problematic case is one in which
they were started but exited prior to a call to connBind.Close.
Explicitly represent started vs running state, taking care with the order of updates.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-22 12:48:10 -07:00
Josh Bleecher Snyder
f845aae761 health: track whether magicsock receive functions are running
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-22 08:57:36 -07:00
Josh Bleecher Snyder
48e30bb8de wgengine/magicsock: remove named return
Doesn't add anything.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-20 10:12:07 -07:00
Josh Bleecher Snyder
a2a2c0ce1c wgengine/magicsock: fix two comments
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-20 10:12:07 -07:00
Josh Bleecher Snyder
b1e624ef04 wgengine/magicsock: remove unnecessary type assertions
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-20 10:12:07 -07:00
Josh Bleecher Snyder
98714e784b wgengine/magicsock: improve Rebind logging
We were accidentally logging oldPort -> oldPort.

Log oldPort as well as c.port; if we failed to get the preferred port
in a previous rebind, oldPort might differ from c.port.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-20 10:12:07 -07:00
Josh Bleecher Snyder
15ceacc4c5 wgengine/magicsock: accept a host and port instead of an addr in listenPacket
This simplifies call sites and prevents accidental failure to use net.JoinHostPort.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-20 10:12:07 -07:00
Brad Fitzpatrick
b993d9802a ipn/ipnlocal, etc: require file sharing capability to send/recv files
tailscale/corp#1582

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-16 10:58:19 -07:00
Brad Fitzpatrick
762180595d ipn/ipnstate: add PeerStatus.TailscaleIPs slice, deprecate TailAddr
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-14 08:12:31 -07:00
Brad Fitzpatrick
34d2f5a3d9 tailcfg: add Endpoint, EndpointType, MapRequest.EndpointType
Track endpoints internally with a new tailcfg.Endpoint type that
includes a typed netaddr.IPPort (instead of just a string) and
includes a type for how that endpoint was discovered (STUN, local,
etc).

Use []tailcfg.Endpoint instead of []string internally.

At the last second, send it to the control server as the existing
[]string for endpoints, but also include a new parallel
MapRequest.EndpointType []tailcfg.EndpointType, so the control server
can start filtering out less-important endpoint changes from
new-enough clients. Notably, STUN-discovered endpoints can be filtered
out from 1.6+ clients, as they can discover them amongst each other
via CallMeMaybe disco exchanges started over DERP. And STUN endpoints
change a lot, causing a lot of MapResposne updates. But portmapped
endpoints are worth keeping for now, as they they work right away
without requiring the firewall traversal extra RTT dance.

End result will be less control->client bandwidth. (despite negligible
increase in client->control bandwidth)

Updates tailscale/corp#1543

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-13 10:12:14 -07:00
Josh Bleecher Snyder
ba72126b72 wgengine/magicsock: remove RebindingUDPConn.FakeClosed
It existed to work around the frequent opening and closing
of the conn.Bind done by wireguard-go.
The preceding commit removed that behavior,
so we can simply close the connections
when we are done with them.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-04-03 10:32:51 -07:00
Josh Bleecher Snyder
69cdc30c6d wgengine/wgcfg: remove Config.ListenPort
We don't use the port that wireguard-go passes to us (via magicsock.connBind.Open).
We ignore it entirely and use the port we selected.

When we tell wireguard-go that we're changing the listen_port,
it calls connBind.Close and then connBind.Open.
And in the meantime, it stops calling the receive functions,
which means that we stop receiving and processing UDP and DERP packets.
And that is Very Bad.

That was never a problem prior to b3ceca1dd7,
because we passed the SkipBindUpdate flag to our wireguard-go fork,
which told wireguard-go not to re-bind on listen_port changes.
That commit eliminated the SkipBindUpdate flag.

We could write a bunch of code to work around the gap.
We could add background readers that process UDP and DERP packets when wireguard-go isn't.
But it's simpler to never create the conditions in which wireguard-go rebinds.

The other scenario in which wireguard-go re-binds is device.Down.
Conveniently, we never call device.Down. We go from device.Up to device.Close,
and the latter only when we're shutting down a magicsock.Conn completely.

Rubber-ducked-by: Avery Pennarun <apenwarr@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-04-03 10:32:51 -07:00
Josh Bleecher Snyder
b3ceca1dd7 wgengine/...: split into multiple receive functions
Upstream wireguard-go has changed its receive model.
NewDevice now accepts a conn.Bind interface.

The conn.Bind is stateless; magicsock.Conns are stateful.
To work around this, we add a connBind type that supports
cheap teardown and bring-up, backed by a Conn.

The new conn.Bind allows us to specify a set of receive functions,
rather than having to shoehorn everything into ReceiveIPv4 and ReceiveIPv6.
This lets us plumbing DERP messages directly into wireguard-go,
instead of having to mux them via ReceiveIPv4.

One consequence of the new conn.Bind layer is that
closing the wireguard-go device is now indistinguishable
from the routine bring-up and tear-down normally experienced
by a conn.Bind. We thus have to explicitly close the magicsock.Conn
when the close the wireguard-go device.

One downside of this change is that we are reliant on wireguard-go
to call receiveDERP to process DERP messages. This is fine for now,
but is perhaps something we should fix in the future.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-04-02 12:18:54 -07:00
Josh Bleecher Snyder
34d4943357 all: gofmt -s
The code is not obviously better or worse, but this makes the little warning
triangle in my editor go away, and the distraction removal is worth it.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-04-01 11:06:14 -07:00
Josh Bleecher Snyder
1df162b05b wgengine/magicsock: adapt CreateEndpoint signature to match wireguard-go
Part of a temporary change to make merging wireguard-go easier.
See https://github.com/tailscale/wireguard-go/pull/45.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-04-01 09:55:45 -07:00
Josh Bleecher Snyder
36a85e1760 wgengine/magicsock: don't call t.Fatal in magicStack.IP
It can end up executing an a new goroutine,
at which point instead of immediately stopping test execution, it hangs.
Since this is unexpected anyway, panic instead.
As a bonus, it makes call sites nicer and removes a kludge comment.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-03-30 11:48:13 -07:00
David Anderson
016de16b2e net/tstun: rename TUN to Wrapper.
The tstun packagen contains both constructors for generic tun
Devices, and a wrapper that provides additional functionality.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-03-26 23:15:22 -07:00
David Anderson
588b70f468 net/tstun: merge in wgengine/tstun.
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-03-26 22:31:54 -07:00
Brad Fitzpatrick
81143b6d9a ipn/ipnlocal: start of peerapi between nodes
Also some necessary refactoring of the ipn/ipnstate too.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-25 16:00:35 -07:00
Josh Bleecher Snyder
28af46fb3b wgengine: pass logger as a separate arg to device.NewDevice
Adapt to minor API changes in wireguard-go.
And factor out device.DeviceOptions variables.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-03-24 10:39:58 -07:00
Josh Bleecher Snyder
4b77eca2de wgengine/magicsock: check returned error in addTestEndpoint
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-03-24 09:46:36 -07:00
Brad Fitzpatrick
c99f260e40 wgengine/magicsock: prefer IPv6 transport if roughly equivalent latency
Fixes #1566

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-23 17:34:01 -07:00
Brad Fitzpatrick
9643d8b34d wgengine/magicsock: add an addrLatency type to combine an IPPort+time.Duration
Updates #1566 (but no behavior changes as of this change)

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-23 10:09:10 -07:00
Brad Fitzpatrick
0994a9f7c4 wgengine{,/magicsock}: fix, improve "tailscale ping" to default routes and subnets
e.g.

$ tailscale ping 1.1.1.1
exit node found but not enabled

$ tailscale ping 10.2.200.2
node "tsbfvlan2" found, but not using its 10.2.200.0/24 route

$ sudo tailscale  up --accept-routes
$ tailscale ping 10.2.200.2
pong from tsbfvlan2 (100.124.196.94) via 10.2.200.34:41641 in 1ms

$ tailscale ping mon.ts.tailscale.com
pong from monitoring (100.88.178.64) via DERP(sfo) in 83ms
pong from monitoring (100.88.178.64) via DERP(sfo) in 21ms
pong from monitoring (100.88.178.64) via [2604:a880:4:d1::37:d001]:41641 in 22ms

This necessarily moves code up from magicsock to wgengine, so we can
look at the actual wireguard config.

Fixes #1564

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-22 21:29:44 -07:00
Brad Fitzpatrick
7e0d12e7cc wgengine/magicsock: don't update control if only endpoint order changes
Updates #1559

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-22 10:37:04 -07:00
Brad Fitzpatrick
32562a82a9 wgengine/magicsock: annotate a few more disco logs as verbose
Fixes #1540

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-19 13:24:29 -07:00
Brad Fitzpatrick
c19ed37b0f wgengine/magicsock: mark some legacy debug log output as verbose
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-18 08:17:59 -07:00
Brad Fitzpatrick
ba8c6d0775 health, controlclient, ipn, magicsock: tell health package state of things
Not yet checking anything. Just plumbing states into the health package.

Updates #1505

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-15 15:20:55 -07:00
Brad Fitzpatrick
44ab0acbdb net/portmapper, wgengine/monitor: cache gateway IP info until link changes
Cuts down allocs & CPU in steady state (on regular STUN probes) when network
is unchanging.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-15 14:27:39 -07:00
Brad Fitzpatrick
c81814e4f8 derp{,/derphttp},magicsock: tell DERP server when ping acks can be expected
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-12 09:55:02 -08:00
Brad Fitzpatrick
c576fea60e wgengine/magicsock: delete unused WhoIs method that was moved elsewhere
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-11 11:44:01 -08:00
Brad Fitzpatrick
ef7bac2895 tailcfg, net/portmapper, wgengine/magicsock: add NetInfo.HavePortMap
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-09 15:17:24 -08:00
Brad Fitzpatrick
79d8288f0a wgengine/magicsock, derp, derp/derphttp: respond to DERP server->client pings
No server support yet, but we want Tailscale 1.6 clients to be able to respond
to them when the server can do it.

Updates #1310

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-09 13:56:13 -08:00
Brad Fitzpatrick
387e83c8fe wgengine/magicsock: fix Conn.Rebind race that let ErrClosed errors be read
There was a logical race where Conn.Rebind could acquire the
RebindingUDPConn mutex, close the connection, fail to rebind, release
the mutex, and then because the mutex was no longer held, ReceiveIPv4
wouldn't retry reads that failed with net.ErrClosed, letting that
error back to wireguard-go, which would then stop running that receive
IP goroutine.

Instead, keep the RebindingUDPConn mutex held for the entirety of the
replacement in all cases.

Updates tailscale/corp#1289

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-08 21:08:35 -08:00
Brad Fitzpatrick
c445e3d327 wgengine/magicsock: fix typo in comment 2021-03-08 15:27:11 -08:00
Brad Fitzpatrick
a6d098c750 wgengine/magicsock: log when DERP connection succeeds
Updates #1310
2021-03-04 09:30:00 -08:00
Brad Fitzpatrick
829eb8363a net/interfaces: sort returned addresses from LocalAddresses
Also change the type to netaddr.IP while here, because it made sorting
easier.

Updates tailscale/corp#1397

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-04 07:04:39 -08:00
Brad Fitzpatrick
c3e5903b91 wgengine/magicsock: remove leftover portmapper debug logging
It's already logged at the right time in logEndpointChange.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-02 12:42:45 -08:00
Brad Fitzpatrick
ea3715e3ce wgengine/magicsock: remove TODO about endpoints-over-DERP
It was done in Tailscale 1.4 with CallMeMaybe disco messages
containing endpoints.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-02-24 21:34:31 -08:00
David Anderson
2404c0ffad ipn/ipnlocal: only filter out default routes when computing the local wg config.
UIs need to see the full unedited netmap in order to know what exit nodes they
can offer to the user.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-02-24 20:41:56 -08:00
Brad Fitzpatrick
e9e4f1063d wgengine/magicsock: fix discoEndpoint caching bug when a node key changes
Fixes #1391

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-02-23 14:39:15 -08:00
Brad Fitzpatrick
c64bd587ae net/portmapper: add NAT-PMP client, move port mapping service probing
* move probing out of netcheck into new net/portmapper package
* use PCP ANNOUNCE op codes for PCP discovery, rather than causing
  short-lived (sub-second) side effects with a 1-second-expiring map +
  delete.
* track when we heard things from the router so we can be less wasteful
  in querying the router's port mapping services in the future
* use portmapper from magicsock to map a public port

Fixes #1298
Fixes #1080
Fixes #1001
Updates #864

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-02-23 09:07:38 -08:00
Josh Bleecher Snyder
c7e5ab8094 wgengine/magicsock: retry and re-send packets in TestTwoDevicePing
When a handshake race occurs, a queued data packet can get lost.
TestTwoDevicePing expected that the very first data packet would arrive.
This caused occasional flakes.

Change TestTwoDevicePing to repeatedly re-send packets
and succeed when one of them makes it through.

This is acceptable (vs making WireGuard not drop the packets)
because this only affects communication with extremely old clients.
And those extremely old clients will eventually connect,
because the kernel will retry sends on timeout.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-02-12 14:18:58 -08:00
Josh Bleecher Snyder
1632f9fd6b wgengine/magicsock: reduce log spam during tests
Only do the type assertion to *net.UDPAddr when addr is non-nil.
This prevents a bunch of log spam during tests.
2021-02-12 10:49:02 -08:00
Josh Bleecher Snyder
88586ec4a4 wgengine/magicsock: remove an alloc from ReceiveIPvN
We modified the standard net package to not allocate a *net.UDPAddr
during a call to (*net.UDPConn).ReadFromUDP if the caller's use
of the *net.UDPAddr does not cause it to escape.
That is https://golang.org/cl/291390.

This is the companion change to magicsock.
There are two changes required.
First, call ReadFromUDP instead of ReadFrom, if possible.
ReadFrom returns a net.Addr, which is an interface, which always allocates.
Second, reduce the lifetime of the returned *net.UDPAddr.
We do this by immediately converting it into a netaddr.IPPort.

We left the existing RebindingUDPConn.ReadFrom method in place,
as it is required to satisfy the net.PacketConn interface.

With the upstream change and both of these fixes in place,
we have removed one large allocation per packet received.

name           old time/op    new time/op    delta
ReceiveFrom-8    16.7µs ± 5%    16.4µs ± 8%     ~     (p=0.310 n=5+5)

name           old alloc/op   new alloc/op   delta
ReceiveFrom-8      112B ± 0%       64B ± 0%  -42.86%  (p=0.008 n=5+5)

name           old allocs/op  new allocs/op  delta
ReceiveFrom-8      3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.008 n=5+5)

Co-authored-by: Sonia Appasamy <sonia@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-02-12 09:52:43 -08:00
Josh Bleecher Snyder
0c673c1344 wgengine/magicsock: unify on netaddr types in addrSet
addrSet maintained duplicate lists of netaddr.IPPorts and net.UDPAddrs.
Unify to use the netaddr type only.

This makes (*Conn).ReceiveIPvN a bit uglier,
but that'll be cleaned up in a subsequent commit.

This is preparatory work to remove an allocation from ReceiveIPv4.

Co-authored-by: Sonia Appasamy <sonia@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-02-12 09:52:43 -08:00