Commit Graph

941 Commits

Author SHA1 Message Date
Brad Fitzpatrick
7fcf86a14a wgengine: fix link monitor / magicsock Start race
Fixes #2733

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-30 09:12:10 -07:00
Brad Fitzpatrick
83906abc5e wgengine/netstack: clarify a comment
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-27 11:10:56 -07:00
Brad Fitzpatrick
1925fb584e wgengine/netstack: fix crash in userspace netstack TCP forwarding
Fixes #2658

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-25 15:48:05 -07:00
slowy07
ac0353e982 fix: typo spelling grammar
Signed-off-by: slowy07 <slowy.arfy@gmail.com>
2021-08-24 07:55:04 -07:00
Brad Fitzpatrick
37053801bb wgengine/magicsock: restore a bit of logging on node becoming active
Fixes #2695

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-23 12:22:23 -07:00
Denton Gentry
6731f934a6 Revert "wgengine: actively log FlushDNS."
This log is quite verbose, it was only to be left in for one
unstable build to help debug a user issue.

This reverts commit 1dd2552032.

Signed-off-by: Denton Gentry <dgentry@tailscale.com>
2021-08-20 18:12:47 -07:00
Denton Gentry
1dd2552032 wgengine: actively log FlushDNS.
Intended to help in resolving customer issue with
DNS caching.

We currently exec `ipconfig /flushdns` from two
places:
- SetDNS(), which logs before invoking
- here in router_windows, which doesn't

We'd like to see a positive indication in logs that flushdns
is being run.

As this log is expected to be spammy, it is proposed to
leave this in just long enough to do an unstable 1.13.x build
and then revert it. They won't run an unsigned image that
I build.

Signed-off-by: Denton Gentry <dgentry@tailscale.com>
2021-08-19 14:43:14 -07:00
Josh Bleecher Snyder
6ef734e493 wgengine: predict min.Peers length across calls
The number of peers we have will be pretty stable across time.
Allocate roughly the right slice size.
This reduces memory usage when there are many peers.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-08-18 16:12:45 -07:00
Josh Bleecher Snyder
adf696172d wgengine/userspace: reduce allocations in getStatus
Two optimizations.

Use values instead of pointers.
We were using pointers to make track the "peer in progress" easier.
It's not too hard to do it manually, though.

Make two passes through the data, so that we can size our
return value accurately from the beginning.
This is cheap enough compared to the allocation,
which grows linearly in the number of peers,
that it is worth doing.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-08-18 16:12:08 -07:00
Maisem Ali
5c383bdf5d wgengine/router: pass in AmbientCaps when calling ip rule
Signed-off-by: Maisem Ali <maisem@tailscale.com>
2021-08-18 13:28:53 -07:00
Brad Fitzpatrick
39610aeb09 wgengine/magicsock: move debug knobs to their own file, compile out on iOS
No need for these knobs on iOS where you can set the environment
variables anyway.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-15 13:21:22 -07:00
Josh Bleecher Snyder
a5da4ed981 all: gofmt with Go 1.17
This adds "//go:build" lines and tidies up existing "// +build" lines.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-08-05 15:54:00 -07:00
Brad Fitzpatrick
a729070252 net/tstun: add start of Linux TAP support, with DHCP+ARP server
Still very much a prototype (hard-coded IPs, etc) but should be
non-invasive enough to submit at this point and iterate from here.

Updates #2589

Co-Author: David Crawshaw <crawshaw@tailscale.com>
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-05 10:01:45 -07:00
Brad Fitzpatrick
f3c96df162 ipn/ipnstate: move tailscale status "active" determination to tailscaled
Fixes #2579

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-04 09:10:49 -07:00
Brad Fitzpatrick
b622c60ed0 derp,wgengine/magicsock: don't assume stringer is in $PATH for go:generate
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-01 19:14:08 -07:00
Josh Bleecher Snyder
9da4181606 tstime/rate: new package
This is a simplified rate limiter geared for exactly our needs:
A fast, mono.Time-based rate limiter for use in tstun.
It was generated by stripping down the x/time/rate rate limiter
to just our needs and switching it to use mono.Time.

It removes one time.Now call per packet.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-07-29 12:56:58 -07:00
Josh Bleecher Snyder
f6e833748b wgengine: use mono.Time
Migrate wgengine to mono.Time for performance-sensitive call sites.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-07-29 12:56:58 -07:00
Josh Bleecher Snyder
8a3d52e882 wgengine/magicsock: use mono.Time
magicsock makes multiple calls to Now per packet.
Move to mono.Now. Changing some of the calls to
use package mono has a cascading effect,
causing non-per-packet call sites to also switch.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-07-29 12:56:58 -07:00
Brad Fitzpatrick
5c266bdb73 wgengine: re-set DNS config on Linux after a major link change
Updates #2458 (maybe fixes it)

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-07-26 08:01:27 -07:00
Brad Fitzpatrick
95a9adbb97 wgengine/netstack: implement UDP relaying to advertised subnets
TCP was done in 662fbd4a09.

This does the same for UDP.

Tested by hand. Integration tests will have to come later. I'd wanted
to do it in this commit, but the SOCKS5 server needed for interop
testing between two userspace nodes doesn't yet support UDP and I
didn't want to invent some whole new userspace packet injection
interface at this point, as SOCKS seems like a better route, but
that's its own bug.

Fixes #2302

RELNOTE=netstack mode can now UDP relay to subnets

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-07-21 22:32:26 -07:00
Brad Fitzpatrick
ecac74bb65 wgengine/netstack: fix doc comment
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-07-21 08:25:05 -07:00
Brad Fitzpatrick
e4fecfe31d wgengine/{monitor,router}: restore Linux ip rules when systemd deletes them
Thanks.

Fixes #1591

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-07-20 15:52:22 -07:00
Brad Fitzpatrick
ed8587f90d wgengine/router: take a link monitor
Prep for #1591 which will need to make Linux's router react to changes
that the link monitor observes.

The router package already depended on the monitor package
transitively. Now it's explicit.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-07-20 13:43:40 -07:00
Joe Tsai
9a0c8bdd20 util/deephash: make hash type opaque
The fact that Hash returns a [sha256.Size]byte leaks details about
the underlying hash implementation. This could very well be any other
hashing algorithm with a possible different block size.

Abstract this implementation detail away by declaring an opaque type
that is comparable. While we are changing the signature of UpdateHash,
rename it to just Update to reduce stutter (e.g., deephash.Update).

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2021-07-20 11:03:25 -07:00
Josh Bleecher Snyder
4dbbd0aa4a cmd/addlicense: add command to add licenseheaders to generated code
And use it to make our stringer invocations match the existing code.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-07-19 15:31:56 -07:00
Josh Bleecher Snyder
c179580599 wgengine/magicsock: add debug envvar to force all traffic over DERP
This would have been useful during debugging DERP issues recently.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-07-19 15:30:50 -07:00
Brad Fitzpatrick
e41193ec4d wgengine/monitor: don't spam about Linux RTM_NEWRULE events
The earlier 2ba36c294b started listening
for ip rule changes and only cared about DELRULE events, buts its subscription
included all rule events, including new ones, which meant we were then
catching our own ip rule creations and logging about how they were unknown.

Stop that log spam.

Updates #1591

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-07-19 14:30:15 -07:00
Brad Fitzpatrick
2ba36c294b wgengine/monitor: subscribe to Linux ip rule events, log on rule deletes
For debugging & working on #1591 where certain versions of systemd-networkd
delete Tailscale's ip rule entries.

Updates #1591

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-07-18 14:50:47 -07:00
Josh Bleecher Snyder
4f4dae32dd wgengine/magicsock: fix latent data race in test
logBufWriter had no serialization.
It just so happens that none of its users currently ever log concurrently.
Make it safe for concurrent use.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-07-13 15:14:18 -07:00
julianknodt
fb06ad19e7 wgcfg: Switch to using mem.RO
As Brad suggested, mem.RO allows for a lot of easy perf gains. There were also some smaller
changes outside of mem.RO, such as using hex.Decode instead of hex.DecodeString.

```
name        old time/op    new time/op    delta
FromUAPI-8    14.7µs ± 3%    12.3µs ± 4%  -16.58%  (p=0.008 n=5+5)

name        old alloc/op   new alloc/op   delta
FromUAPI-8    9.52kB ± 0%    7.04kB ± 0%  -26.05%  (p=0.008 n=5+5)

name        old allocs/op  new allocs/op  delta
FromUAPI-8      77.0 ± 0%      29.0 ± 0%  -62.34%  (p=0.008 n=5+5)
```

Signed-off-by: julianknodt <julianknodt@gmail.com>
2021-07-13 13:45:44 -07:00
julianknodt
d349a3231e wgcfg: use string cut instead of string split
Signed-off-by: julianknodt <julianknodt@gmail.com>
2021-07-13 13:45:44 -07:00
julianknodt
664edbe566 wgcfg: add benchmark for FromUAPI
Adds a benchmark for FromUAPI in wgcfg.
It appears that it's not actually that slow, the main allocations are from the scanner and new
config.
Updates #1912.

Signed-off-by: julianknodt <julianknodt@gmail.com>
2021-07-13 13:45:44 -07:00
Brad Fitzpatrick
7e7c4c1bbe tailcfg: break DERPNode.DERPTestPort into DERPPort & InsecureForTests
The DERPTestPort int meant two things before: which port to use, and
whether to disable TLS verification. Users would like to set the port
without disabling TLS, so break it into two options.

Updates #1264

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-07-09 12:30:31 -07:00
Brad Fitzpatrick
92077ae78c wgengine/magicsock: make portmapping async
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-07-09 11:15:26 -07:00
Brad Fitzpatrick
700badd8f8 util/deephash: move internal/deephash to util/deephash
No code changes. Just a minor package doc addition about lack of API
stability.
2021-07-02 21:33:02 -07:00
Maisem Ali
ec52760a3d wgengine/router_windows: support toggling local lan access when using
exit nodes.

Signed-off-by: Maisem Ali <maisem@tailscale.com>
2021-06-29 09:22:10 -07:00
Brad Fitzpatrick
722859b476 wgengine/netstack: make SOCKS5 resolve names to IPv6 if self node when no IPv4
For instance, ephemeral nodes with only IPv6 addresses can now
SOCKS5-dial out to names like "foo" and resolve foo's IPv6 address
rather than foo's IPv4 address and get a "no route"
(*tcpip.ErrNoRoute) error from netstack's dialer.

Per https://github.com/tailscale/tailscale/issues/2268#issuecomment-870027626
which is only part of the isuse.

Updates #2268

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-06-28 15:20:37 -07:00
julianknodt
506c2fe8e2 cmd/tailscale: make netcheck use active DERP map, delete static copy
After allowing for custom DERP maps, it's convenient to be able to see their latency in
netcheck. This adds a query to the local tailscaled for the current DERPMap.

Updates #1264

Signed-off-by: julianknodt <julianknodt@gmail.com>
2021-06-28 14:08:47 -07:00
Christine Dodrill
59e9b44f53
wgengine/filter: add a debug flag for filter logs (#2241)
This uses a debug envvar to optionally disable filter logging rate
limits by setting the environment variable
TS_DEBUG_FILTER_RATE_LIMIT_LOGS to "all", and if it matches,
the code will effectively disable the limits on the log rate by
setting the limit to 1 millisecond. This should make sure that all
filter logs will be captured.

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-06-25 10:10:26 -04:00
Brad Fitzpatrick
c45bfd4180 wgengine: make dnsIPsOverTailscale also consider DefaultResolvers
Found during a failed experiment debugging something on Android.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-06-24 12:57:26 -07:00
Brad Fitzpatrick
b92e2ebd24 wgengine/netstack: add Impl.DialContextUDP
Unused so far, but eventually we'll want this for SOCKS5 UDP binds (we
currently only do TCP with SOCKS5), and also for #2102 for forwarding
MagicDNS upstream to Tailscale IPs over netstack.

Updates #2102

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-06-23 22:12:17 -07:00
Brad Fitzpatrick
45e64f2e1a net/dns{,/resolver}: refactor DNS forwarder, send out of right link on macOS/iOS
Fixes #2224
Fixes tailscale/corp#2045

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-06-23 16:04:10 -07:00
David Crawshaw
4ce15505cb wgengine: randomize client port if netmap says to
For testing out #2187

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2021-06-23 08:51:37 -07:00
David Crawshaw
5f8ffbe166 magicsock: add SetPreferredPort method
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2021-06-23 08:51:37 -07:00
Brad Fitzpatrick
80a4052593 cmd/tailscale, wgengine, tailcfg: don't assume LastSeen is present [mapver 20]
Updates #2107

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-06-11 08:41:16 -07:00
Fletcher Nichol
a49df5cfda wgenine/router: fix OpenBSD route creation
The route creation for the `tun` device was augmented in #1469 but
didn't account for adding IPv4 vs. IPv6 routes. There are 2 primary
changes as a result:

* Ensure that either `-inet` or `-inet6` was used in the
  [`route(8)`](https://man.openbsd.org/route) command
* Use either the `localAddr4` or `localAddr6` for the gateway argument
  depending which destination network is being added

The basis for the approach is based on the implementation from
`router_userspace_bsd.go`, including the `inet()` helper function.

Fixes #2048
References #1469

Signed-off-by: Fletcher Nichol <fnichol@nichol.ca>
2021-06-10 10:48:33 -07:00
Josh Bleecher Snyder
e92fd19484 wgengine/wglog: match upstream wireguard-go's code for wireguardGoString
It is a bit faster.

But more importantly, it matches upstream byte-for-byte,
which ensures there'll be no corner cases in which we disagree.

name        old time/op    new time/op    delta
SetPeers-8    3.58µs ± 0%    3.16µs ± 2%  -11.74%  (p=0.016 n=4+5)

name        old alloc/op   new alloc/op   delta
SetPeers-8    2.53kB ± 0%    2.53kB ± 0%     ~     (all equal)

name        old allocs/op  new allocs/op  delta
SetPeers-8      99.0 ± 0%      99.0 ± 0%     ~     (all equal)

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-06-04 13:06:28 -07:00
Brad Fitzpatrick
a321c24667 go.mod: update netaddr
Involves minor IPSetBuilder.Set API change.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-06-02 09:05:06 -07:00
Josh Bleecher Snyder
ddf6c8c729 wgengine/magicsock: delete dead code
Co-authored-by: Adrian Dewhurst <adrian@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-28 17:02:08 -07:00
Josh Bleecher Snyder
1ece91cede go.mod: upgrade wireguard-windows, de-fork wireguard-go
Pull in the latest version of wireguard-windows.

Switch to upstream wireguard-go.
This requires reverting all of our import paths.

Unfortunately, this has to happen at the same time.
The wireguard-go change is very low risk,
as that commit matches our fork almost exactly.
(The only changes are import paths, CI files, and a go.mod entry.)
So if there are issues as a result of this commit,
the first place to look is wireguard-windows changes.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-25 13:18:21 -07:00
Josh Bleecher Snyder
ceaaa23962 wgengine/wglog: cache strings
We repeat many peers each time we call SetPeers.
Instead of constructing strings for them from scratch every time,
keep strings alive across iterations.

name        old time/op    new time/op    delta
SetPeers-8    3.58µs ± 1%    2.41µs ± 1%  -32.60%  (p=0.000 n=9+10)

name        old alloc/op   new alloc/op   delta
SetPeers-8    2.53kB ± 0%    1.30kB ± 0%  -48.73%  (p=0.000 n=10+10)

name        old allocs/op  new allocs/op  delta
SetPeers-8      99.0 ± 0%      16.0 ± 0%  -83.84%  (p=0.000 n=10+10)

We could reduce alloc/op 12% and allocs/op 23% if strs had
type map[string]strCache instead of map[string]*strCache,
but that wipes out the execution time impact.
Given that re-use is the most common scenario, let's optimize for it.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-24 18:41:54 -07:00
Josh Bleecher Snyder
73adbb7a78 wgengine: pass an addressable value to deephash.UpdateHash
This makes deephash more efficient.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-24 13:51:23 -07:00
Josh Bleecher Snyder
8bf2a38f29 go.mod: update wireguard-go, taking control over iOS memory usage from our fork
Our wireguard-go fork used different values from upstream for
package device's memory limits on iOS.

This was the last blocker to removing our fork.

These values are now vars rather than consts for iOS.

c27ff9b9f6

Adjust them on startup to our preferred values.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-24 12:03:57 -07:00
Josh Bleecher Snyder
25df067dd0 all: adapt to opaque netaddr types
This commit is a mishmash of automated edits using gofmt:

gofmt -r 'netaddr.IPPort{IP: a, Port: b} -> netaddr.IPPortFrom(a, b)' -w .
gofmt -r 'netaddr.IPPrefix{IP: a, Port: b} -> netaddr.IPPrefixFrom(a, b)' -w .

gofmt -r 'a.IP.Is4 -> a.IP().Is4' -w .
gofmt -r 'a.IP.As16 -> a.IP().As16' -w .
gofmt -r 'a.IP.Is6 -> a.IP().Is6' -w .
gofmt -r 'a.IP.As4 -> a.IP().As4' -w .
gofmt -r 'a.IP.String -> a.IP().String' -w .

And regexps:

\w*(.*)\.Port = (.*)  ->  $1 = $1.WithPort($2)
\w*(.*)\.IP = (.*)  ->  $1 = $1.WithIP($2)

And lots of manual fixups.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-16 14:52:00 -07:00
Brad Fitzpatrick
5b52b64094 tsnet: add Tailscale-as-a-library package
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-14 12:46:42 -07:00
Josh Bleecher Snyder
ebcd7ab890 wgengine: remove wireguard-go DeviceOptions
We no longer need them.
This also removes the 32 bytes of prefix junk before endpoints.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-11 15:30:39 -07:00
Josh Bleecher Snyder
aacb2107ae all: add extra information to serialized endpoints
magicsock.Conn.ParseEndpoint requires a peer's public key,
disco key, and legacy ip/ports in order to do its job.
We currently accomplish that by:

* adding the public key in our wireguard-go fork
* encoding the disco key as magic hostname
* using a bespoke comma-separated encoding

It's a bit messy.

Instead, switch to something simpler: use a json-encoded struct
containing exactly the information we need, in the form we use it.

Our wireguard-go fork still adds the public key to the
address when it passes it to ParseEndpoint, but now the code
compensating for that is just a couple of simple, well-commented lines.
Once this commit is in, we can remove that part of the fork
and remove the compensating code.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-05-11 15:13:42 -07:00
Josh Bleecher Snyder
98cae48e70 wgengine/wglog: optimize wireguardGoString
The new code is ugly, but much faster and leaner.

name        old time/op    new time/op    delta
SetPeers-8    7.81µs ± 1%    3.59µs ± 1%  -54.04%  (p=0.000 n=9+10)

name        old alloc/op   new alloc/op   delta
SetPeers-8    7.68kB ± 0%    2.53kB ± 0%  -67.08%  (p=0.000 n=10+10)

name        old allocs/op  new allocs/op  delta
SetPeers-8       237 ± 0%        99 ± 0%  -58.23%  (p=0.000 n=10+10)

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-11 14:28:47 -07:00
Josh Bleecher Snyder
9356912053 wgengine/wglog: add BenchmarkSetPeer
Because it showed up on hello profiles.

Cycle through some moderate-sized sets of peers.
This should cover the "small tweaks to netmap"
and the "up/down cycle" cases.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-11 14:28:47 -07:00
Brad Fitzpatrick
36a26e6a71 internal/deephash: rename from deepprint
Yes, it printed, but that was an implementation detail for hashing.

And coming optimization will make it print even less.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-11 12:11:16 -07:00
Josh Bleecher Snyder
773fcfd007 Revert "wgengine/bench: skip flaky test"
This reverts commit d707e2f7e5.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-11 11:28:30 -07:00
Josh Bleecher Snyder
68911f6778 wgengine/bench: ignore "engine closing" errors
On benchmark completion, we shut down the wgengine.
If we happen to poll for status during shutdown,
we get an "engine closing" error.
It doesn't hurt anything; ignore it.

Fixes tailscale/corp#1776

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-11 11:28:30 -07:00
Brad Fitzpatrick
d707e2f7e5 wgengine/bench: skip flaky test
Updates tailscale/corp#1776

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-11 11:10:21 -07:00
Josh Bleecher Snyder
8d2a90529e wgengine/bench: hold lock in TrafficGen.GotPacket while calling first packet callback
Without any synchronization here, the "first packet" callback can
be delayed indefinitely, while other work continues.
Since the callback starts the benchmark timer, this could skew results.
Worse, if the benchmark manages to complete before the benchmark timer begins,
it'll cause a data race with the benchmark shutdown performed by package testing.
That is what is reported in #1881.

This is a bit unfortunate, in that it means that users of TrafficGen have
to be careful to keep this callback speedy and lightweight and to avoid deadlocks.

Fixes #1881

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-10 09:45:35 -07:00
Josh Bleecher Snyder
a72fb7ac0b wgengine/bench: handle multiple Engine status callbacks
It is possible to get multiple status callbacks from an Engine.
We need to wait for at least one from each Engine.
Without limiting to one per Engine,
wait.Wait can exit early or can panic due to a negative counter.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-10 09:45:35 -07:00
Josh Bleecher Snyder
6618e82ba2 wgengine/bench: close Engines on benchmark completion
This reduces the speed with which these benchmarks exhaust their supply fds.
Not to zero unfortunately, but it's still helpful when doing long runs.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-10 09:45:35 -07:00
Josh Bleecher Snyder
ddd85b9d91 wgengine/magicsock: rename discoEndpoint.wgEndpointHostPort to wgEndpoint
Fields rename only.

Part of the general effort to make our code agnostic about endpoint formatting.
It's just a name, but it will soon be a misleading one; be more generic.
Do this as a separate commit because it generates a lot of whitespace changes.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-06 12:44:22 -07:00
Josh Bleecher Snyder
e0bd3cc70c wgengine/magicsock: use netaddr.MustParseIPPrefix
Delete our bespoke helper.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-06 12:44:22 -07:00
Josh Bleecher Snyder
bc68e22c5b all: s/CreateEndpoint/ParseEndpoint/ in docs
Upstream wireguard-go renamed the interface method
from CreateEndpoint to ParseEndpoint.
I missed some comments. Fix them.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-06 12:44:22 -07:00
Josh Bleecher Snyder
9bce1b7fc1 wgengine/wgcfg: make device test endpoint-format-agnostic
By using conn.NewDefaultBind, this test requires that our endpoints
be comprehensible to wireguard-go. Instead, use a no-op bind that
treats endpoints as opaque strings.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-06 12:44:22 -07:00
Josh Bleecher Snyder
73ad1f804b wgengine/wgcfg: use autogenerated Clone methods
Delete the manually written ones named Copy.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-06 12:44:22 -07:00
Josh Bleecher Snyder
a0dacba877 wgengine/magicsock: simplify legacy endpoint DstToString
Legacy endpoints (addrSet) currently reconstruct their dst string when requested.

Instead, store the dst string we were given to begin with.
In addition to being simpler and cheaper, this makes less code
aware of how to interpret endpoint strings.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-06 12:44:22 -07:00
Josh Bleecher Snyder
777c816b34 wgengine/wgcfg: return better errors from DeviceConfig, ReconfigDevice
Prefer the error from the actual wireguard-go device method call,
not {To,From}UAPI, as those tend to be less interesting I/O errors.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-06 12:44:22 -07:00
Josh Bleecher Snyder
1f6c4ba7c3 wgengine/wgcfg: prevent ReconfigDevice from hanging on error
When wireguard-go's UAPI interface fails with an error, ReconfigDevice hangs.
Fix that by buffering the channel and closing the writer after the call.
The code now matches the corresponding code in DeviceConfig, where I got it right.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-06 12:44:22 -07:00
Josh Bleecher Snyder
ed63a041bf wgengine/userspace: delete HandshakeDone
It is unused, and has been since early Feb 2021 (Tailscale 1.6).
We can't get delete the DeviceOptions entirely yet;
first #1831 and #1839 need to go in, along with some wireguard-go changes.
Deleting this chunk of code now will make the later commits more clearly correct.

Pingers can now go too.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-06 11:20:46 -07:00
Brad Fitzpatrick
b8fb8264a5 wgengine/netstack: avoid delivering incoming packets to both netstack + host
The earlier eb06ec172f fixed
the flaky SSH issue (tailscale/corp#1725) by making sure that packets
addressed to Tailscale IPs in hybrid netstack mode weren't delivered
to netstack, but another issue remained:

All traffic handled by netstack was also potentially being handled by
the host networking stack, as the filter hook returned "Accept", which
made it keep processing. This could lead to various random racey chaos
as a function of OS/firewalls/routes/etc.

Instead, once we inject into netstack, stop our caller's packet
processing.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-06 06:43:16 -07:00
Brad Fitzpatrick
1a1123d461 wgengine: fix pendopen debug to not track SYN+ACKs, show Node.Online state
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-05 15:25:11 -07:00
Brad Fitzpatrick
eb06ec172f wgengine/netstack: don't pass non-subnet traffic to netstack in hybrid mode
Fixes tailscale/corp#1725

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-05 13:38:55 -07:00
Brad Fitzpatrick
7629cd6120 net/tsaddr: add NewContainsIPFunc (move from wgengine)
I want to use this from netstack but it's not exported.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-05 13:15:50 -07:00
Josh Bleecher Snyder
47ebd1e9a2 wgengine/router: use net.IP.Equal instead of bytes.Equal to compare IPs
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-05-04 08:54:50 -07:00
Josh Bleecher Snyder
f91c2dfaca wgengine/router: remove unused field
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-05-04 08:54:50 -07:00
Josh Bleecher Snyder
9360f36ebd all: use lower-case letters at the start of error message
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-05-04 08:54:50 -07:00
Josh Bleecher Snyder
64047815b0 wgenengine/magicsock: delete cursed tests
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-05-03 11:09:44 -07:00
Josh Bleecher Snyder
59026a291d wgengine/wglog: improve wireguard-go logging rate limiting
Prior to wireguard-go using printf-style logging,
all wireguard-go logging occurred using format string "%s".
We fixed that but continued to use %s when we rewrote
peer identifiers into Tailscale style.

This commit removes that %sl, which makes rate limiting work correctly.
As a happy side-benefit, it should generate less garbage.

Instead of replacing all wireguard-go peer identifiers
that might occur anywhere in a fully formatted log string,
assume that they only come from args.
Check all args for things that look like *device.Peers
and replace them with appropriately reformatted strings.

There is a variety of ways that this could go wrong
(unusual format verbs or modifiers, peer identifiers
occurring as part of a larger printed object, future API changes),
but none of them occur now, are likely to be added,
or would be hard to work around if they did.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-30 09:45:10 -07:00
Josh Bleecher Snyder
1f94d43b50 wgengine/wglog: delay formatting
The "stop phrases" we use all occur in wireguard-go in the format string.
We can avoid doing a bunch of fmt.Sprintf work when they appear.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-30 09:45:10 -07:00
Josh Bleecher Snyder
20e04418ff net/dns: add GOOS build tags
Fixes #1786

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-29 21:34:55 -07:00
Josh Bleecher Snyder
7ee891f5fd all: delete wgcfg.Key and wgcfg.PrivateKey
For historical reasons, we ended up with two near-duplicate
copies of curve25519 key types, one in the wireguard-go module
(wgcfg) and one in the tailscale module (types/wgkey).
Then we moved wgcfg to the tailscale module.
We can now remove the wgcfg key type in favor of wgkey.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-29 14:14:34 -07:00
Josh Bleecher Snyder
9d542e08e2 wgengine/magicsock: always run ReceiveIPv6
One of the consequences of the 	bind refactoring in 6f23087175
is that attempting to bind an IPv6 socket will always
result in c.pconn6.pconn being non-nil.
If the bind fails, it'll be set to a placeholder packet conn
that blocks forever.

As a result, we can always run ReceiveIPv6 and health check it.
This removes IPv4/IPv6 asymmetry and also will allow health checks
to detect any IPv6 receive func failures.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 11:07:14 -07:00
Josh Bleecher Snyder
fe50ded95c health: track whether we have a functional udp4 bind
Suggested-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 11:07:14 -07:00
Josh Bleecher Snyder
7dc7078d96 wgengine/magicsock: use netaddr.IP in listenPacket
It must be an IP address; enforce that at the type level.

Suggested-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 11:07:14 -07:00
Josh Bleecher Snyder
3c543c103a wgengine/magicsock: unify initial bind and rebind
We had two separate code paths for the initial UDP listener bind
and any subsequent rebinds.

IPv6 got left out of the rebind code.
Rather than duplicate it there, unify the two code paths.
Then improve the resulting code:

* Rebind had nested listen attempts to try the user-specified port first,
  and then fall back to :0 if that failed. Convert that into a loop.
* Initial bind tried only the user-specified port.
  Rebind tried the user-specified port and 0.
  But there are actually three ports of interest:
  The one the user specified, the most recent port in use, and 0.
  We now try all three in order, as appropriate.
* In the extremely rare case in which binding to port 0 fails,
  use a dummy net.PacketConn whose reads block until close.
  This will keep the wireguard-go receive func goroutine alive.

As a pleasant side-effect of this, if we decide that
we need to resuscitate #1796, it will now be much easier.

Fixes #1799

Co-authored-by: David Anderson <danderson@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 10:39:28 -07:00
Josh Bleecher Snyder
8fb66e20a4 wgengine/magicsock: remove DefaultPort const
Assume it'll stay at 0 forever, so hard-code it
and delete code conditional on it being non-0.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 10:39:28 -07:00
Josh Bleecher Snyder
a8f61969b9 wgengine/magicsock: remove context arg from listenPacket
It was set to context.Background by all callers, for the same reasons.
Set it locally instead, to simplify call sites.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 10:39:28 -07:00
Brad Fitzpatrick
bb2141e0cf wgengine: periodically poll engine status for logging side effect
Fixes tailscale/corp#1560

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-27 13:55:47 -07:00
Brad Fitzpatrick
3c9dea85e6 wgengine: update a log line from 'weird' to conventional 'unexpected'
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-27 09:59:25 -07:00
Josh Bleecher Snyder
744de615f1 health, wgenegine: fix receive func health checks for the fourth time
The old implementation knew too much about how wireguard-go worked.
As a result, it missed genuine problems that occurred due to unrelated bugs.

This fourth attempt to fix the health checks takes a black box approach.
A receive func is healthy if one (or both) of these conditions holds:

* It is currently running and blocked.
* It has been executed recently.

The second condition is required because receive functions
are not continuously executing. wireguard-go calls them and then
processes their results before calling them again.

There is a theoretical false positive if wireguard-go go takes
longer than one minute to process the results of a receive func execution.
If that happens, we have other problems.

Updates #1790

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-26 17:35:49 -07:00
Josh Bleecher Snyder
0d4c8cb2e1 health: delete ReceiveFunc health checks
They were not doing their job.
They need yet another conceptual re-think.
Start by clearing the decks.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-26 17:35:49 -07:00
Josh Bleecher Snyder
99705aa6b7 net/tstun: split TUN events channel into up/down and MTU
We had a long-standing bug in which our TUN events channel
was being received from simultaneously in two places.

The first is wireguard-go.

At wgengine/userspace.go:366, we pass e.tundev to wireguard-go,
which starts a goroutine (RoutineTUNEventReader)
that receives from that channel and uses events to adjust the MTU
and bring the device up/down.

At wgengine/userspace.go:374, we launch a goroutine that
receives from e.tundev, logs MTU changes, and triggers
state updates when up/down changes occur.

Events were getting delivered haphazardly between the two of them.

We don't really want wireguard-go to receive the up/down events;
we control the state of the device explicitly by calling device.Up.
And the userspace.go loop MTU logging duplicates logging that
wireguard-go does when it received MTU updates.

So this change splits the single TUN events channel into up/down
and other (aka MTU), and sends them to the parties that ought
to receive them.

I'm actually a bit surprised that this hasn't caused more visible trouble.
If a down event went to wireguard-go but the subsequent up event
went to userspace.go, we could end up with the wireguard-go device disappearing.

I believe that this may also (somewhat accidentally) be a fix for #1790.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-26 17:16:51 -07:00
Avery Pennarun
a7fe1d7c46 wgengine/bench: improved rate selection.
The old decay-based one took a while to converge. This new one (based
very loosely on TCP BBR) seems to converge quickly on what seems to be
the best speed.

Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2021-04-26 03:51:13 -04:00
Avery Pennarun
a92b9647c5 wgengine/bench: speed test for channels, sockets, and wireguard-go.
This tries to generate traffic at a rate that will saturate the
receiver, without overdoing it, even in the event of packet loss. It's
unrealistically more aggressive than TCP (which will back off quickly
in case of packet loss) but less silly than a blind test that just
generates packets as fast as it can (which can cause all the CPU to be
absorbed by the transmitter, giving an incorrect impression of how much
capacity the total system has).

Initial indications are that a syscall about every 10 packets (TCP bulk
delivery) is roughly the same speed as sending every packet through a
channel. A syscall per packet is about 5x-10x slower than that.

The whole tailscale wireguard-go + magicsock + packet filter
combination is about 4x slower again, which is better than I thought
we'd do, but probably has room for improvement.

Note that in "full" tailscale, there is also a tundev read/write for
every packet, effectively doubling the syscall overhead per packet.

Given these numbers, it seems like read/write syscalls are only 25-40%
of the total CPU time used in tailscale proper, so we do have
significant non-syscall optimization work to do too.

Sample output:

$ GOMAXPROCS=2 go test -bench . -benchtime 5s ./cmd/tailbench
goos: linux
goarch: amd64
pkg: tailscale.com/cmd/tailbench
cpu: Intel(R) Core(TM) i7-4785T CPU @ 2.20GHz
BenchmarkTrivialNoAlloc/32-2         	56340248	        93.85 ns/op	 340.98 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTrivialNoAlloc/124-2        	57527490	        99.27 ns/op	1249.10 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTrivialNoAlloc/1024-2       	52537773	       111.3 ns/op	9200.39 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTrivial/32-2                	41878063	       135.6 ns/op	 236.04 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTrivial/124-2               	41270439	       138.4 ns/op	 896.02 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTrivial/1024-2              	36337252	       154.3 ns/op	6635.30 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkBlockingChannel/32-2           12171654	       494.3 ns/op	  64.74 MB/s	         0 %lost	    1791 B/op	       0 allocs/op
BenchmarkBlockingChannel/124-2          12149956	       507.8 ns/op	 244.17 MB/s	         0 %lost	    1792 B/op	       1 allocs/op
BenchmarkBlockingChannel/1024-2         11034754	       528.8 ns/op	1936.42 MB/s	         0 %lost	    1792 B/op	       1 allocs/op
BenchmarkNonlockingChannel/32-2          8960622	      2195 ns/op	  14.58 MB/s	         8.825 %lost	    1792 B/op	       1 allocs/op
BenchmarkNonlockingChannel/124-2         3014614	      2224 ns/op	  55.75 MB/s	        11.18 %lost	    1792 B/op	       1 allocs/op
BenchmarkNonlockingChannel/1024-2        3234915	      1688 ns/op	 606.53 MB/s	         3.765 %lost	    1792 B/op	       1 allocs/op
BenchmarkDoubleChannel/32-2          	 8457559	       764.1 ns/op	  41.88 MB/s	         5.945 %lost	    1792 B/op	       1 allocs/op
BenchmarkDoubleChannel/124-2         	 5497726	      1030 ns/op	 120.38 MB/s	        12.14 %lost	    1792 B/op	       1 allocs/op
BenchmarkDoubleChannel/1024-2        	 7985656	      1360 ns/op	 752.86 MB/s	        13.57 %lost	    1792 B/op	       1 allocs/op
BenchmarkUDP/32-2                    	 1652134	      3695 ns/op	   8.66 MB/s	         0 %lost	     176 B/op	       3 allocs/op
BenchmarkUDP/124-2                   	 1621024	      3765 ns/op	  32.94 MB/s	         0 %lost	     176 B/op	       3 allocs/op
BenchmarkUDP/1024-2                  	 1553750	      3825 ns/op	 267.72 MB/s	         0 %lost	     176 B/op	       3 allocs/op
BenchmarkTCP/32-2                    	11056336	       503.2 ns/op	  63.60 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTCP/124-2                   	11074869	       533.7 ns/op	 232.32 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTCP/1024-2                  	 8934968	       671.4 ns/op	1525.20 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkWireGuardTest/32-2          	 1403702	      4547 ns/op	   7.04 MB/s	        14.37 %lost	     467 B/op	       3 allocs/op
BenchmarkWireGuardTest/124-2         	  780645	      7927 ns/op	  15.64 MB/s	         1.537 %lost	     420 B/op	       3 allocs/op
BenchmarkWireGuardTest/1024-2        	  512671	     11791 ns/op	  86.85 MB/s	         0.5206 %lost	     411 B/op	       3 allocs/op
PASS
ok  	tailscale.com/wgengine/bench	195.724s

Updates #414.

Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2021-04-26 03:51:13 -04:00