Commit Graph

2893 Commits

Author SHA1 Message Date
Avery Pennarun
a7fe1d7c46 wgengine/bench: improved rate selection.
The old decay-based one took a while to converge. This new one (based
very loosely on TCP BBR) seems to converge quickly on what seems to be
the best speed.

Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2021-04-26 03:51:13 -04:00
Avery Pennarun
a92b9647c5 wgengine/bench: speed test for channels, sockets, and wireguard-go.
This tries to generate traffic at a rate that will saturate the
receiver, without overdoing it, even in the event of packet loss. It's
unrealistically more aggressive than TCP (which will back off quickly
in case of packet loss) but less silly than a blind test that just
generates packets as fast as it can (which can cause all the CPU to be
absorbed by the transmitter, giving an incorrect impression of how much
capacity the total system has).

Initial indications are that a syscall about every 10 packets (TCP bulk
delivery) is roughly the same speed as sending every packet through a
channel. A syscall per packet is about 5x-10x slower than that.

The whole tailscale wireguard-go + magicsock + packet filter
combination is about 4x slower again, which is better than I thought
we'd do, but probably has room for improvement.

Note that in "full" tailscale, there is also a tundev read/write for
every packet, effectively doubling the syscall overhead per packet.

Given these numbers, it seems like read/write syscalls are only 25-40%
of the total CPU time used in tailscale proper, so we do have
significant non-syscall optimization work to do too.

Sample output:

$ GOMAXPROCS=2 go test -bench . -benchtime 5s ./cmd/tailbench
goos: linux
goarch: amd64
pkg: tailscale.com/cmd/tailbench
cpu: Intel(R) Core(TM) i7-4785T CPU @ 2.20GHz
BenchmarkTrivialNoAlloc/32-2         	56340248	        93.85 ns/op	 340.98 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTrivialNoAlloc/124-2        	57527490	        99.27 ns/op	1249.10 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTrivialNoAlloc/1024-2       	52537773	       111.3 ns/op	9200.39 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTrivial/32-2                	41878063	       135.6 ns/op	 236.04 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTrivial/124-2               	41270439	       138.4 ns/op	 896.02 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTrivial/1024-2              	36337252	       154.3 ns/op	6635.30 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkBlockingChannel/32-2           12171654	       494.3 ns/op	  64.74 MB/s	         0 %lost	    1791 B/op	       0 allocs/op
BenchmarkBlockingChannel/124-2          12149956	       507.8 ns/op	 244.17 MB/s	         0 %lost	    1792 B/op	       1 allocs/op
BenchmarkBlockingChannel/1024-2         11034754	       528.8 ns/op	1936.42 MB/s	         0 %lost	    1792 B/op	       1 allocs/op
BenchmarkNonlockingChannel/32-2          8960622	      2195 ns/op	  14.58 MB/s	         8.825 %lost	    1792 B/op	       1 allocs/op
BenchmarkNonlockingChannel/124-2         3014614	      2224 ns/op	  55.75 MB/s	        11.18 %lost	    1792 B/op	       1 allocs/op
BenchmarkNonlockingChannel/1024-2        3234915	      1688 ns/op	 606.53 MB/s	         3.765 %lost	    1792 B/op	       1 allocs/op
BenchmarkDoubleChannel/32-2          	 8457559	       764.1 ns/op	  41.88 MB/s	         5.945 %lost	    1792 B/op	       1 allocs/op
BenchmarkDoubleChannel/124-2         	 5497726	      1030 ns/op	 120.38 MB/s	        12.14 %lost	    1792 B/op	       1 allocs/op
BenchmarkDoubleChannel/1024-2        	 7985656	      1360 ns/op	 752.86 MB/s	        13.57 %lost	    1792 B/op	       1 allocs/op
BenchmarkUDP/32-2                    	 1652134	      3695 ns/op	   8.66 MB/s	         0 %lost	     176 B/op	       3 allocs/op
BenchmarkUDP/124-2                   	 1621024	      3765 ns/op	  32.94 MB/s	         0 %lost	     176 B/op	       3 allocs/op
BenchmarkUDP/1024-2                  	 1553750	      3825 ns/op	 267.72 MB/s	         0 %lost	     176 B/op	       3 allocs/op
BenchmarkTCP/32-2                    	11056336	       503.2 ns/op	  63.60 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTCP/124-2                   	11074869	       533.7 ns/op	 232.32 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTCP/1024-2                  	 8934968	       671.4 ns/op	1525.20 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkWireGuardTest/32-2          	 1403702	      4547 ns/op	   7.04 MB/s	        14.37 %lost	     467 B/op	       3 allocs/op
BenchmarkWireGuardTest/124-2         	  780645	      7927 ns/op	  15.64 MB/s	         1.537 %lost	     420 B/op	       3 allocs/op
BenchmarkWireGuardTest/1024-2        	  512671	     11791 ns/op	  86.85 MB/s	         0.5206 %lost	     411 B/op	       3 allocs/op
PASS
ok  	tailscale.com/wgengine/bench	195.724s

Updates #414.

Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2021-04-26 03:51:13 -04:00
Maisem Ali
590792915a wgengine/router{win}: ignore broadcast routes added by Windows when removing routes.
Signed-off-by: Maisem Ali <maisem@tailscale.com>
2021-04-24 14:13:35 -07:00
David Anderson
f6b7d08aea net/dns: work around new NetworkManager in other selection paths.
Further bits of #1788

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-23 22:09:00 -07:00
David Anderson
25ce9885a2 net/dns: don't use NM+resolved for NM >=1.26.6.
NetworkManager fixed the bug that forced us to use NetworkManager
if it's programming systemd-resolved, and in the same release also
made NetworkManager ignore DNS settings provided for unmanaged
interfaces... Which breaks what we used to do. So, with versions
1.26.6 and above, we MUST NOT use NetworkManager to indirectly
program systemd-resolved, but thankfully we can talk to resolved
directly and get the right outcome.

Fixes #1788

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-23 21:13:19 -07:00
David Anderson
31f81b782e util/cmpver: move into OSS from corp repo.
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-23 20:55:45 -07:00
Aleksandar Pesic
7c985e4944 ipn/ipnlocal: add file sharing to windows shell
Updates: tailscale/winmin#33

Signed-off-by: Aleksandar Pesic <peske.nis@gmail.com>
2021-04-23 13:32:33 -07:00
Brad Fitzpatrick
e41075dd4a net/interfaces: work around race fetching routing table
Fixes #1345
Updates golang/go#45736

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-23 13:23:19 -07:00
Brad Fitzpatrick
fe53a714bd ipn/ipnlocal: add a LocalBackend.Start fast path if already running
Updates tailscale/corp#1621

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-23 12:14:12 -07:00
Brad Fitzpatrick
ad1a595a75 ipn/ipnlocal: close peer API listeners on transition away from Running
Updates tailscale/corp#1621

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-23 12:13:50 -07:00
Brad Fitzpatrick
d94ed7310b cmd/tailscale/cli: add test for already-submitted #1777
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-23 10:56:26 -07:00
Josh Bleecher Snyder
8d7f7fc7ce health, wgenegine: fix receive func health checks yet again
The existing implementation was completely, embarrassingly conceptually broken.

We aren't able to see whether wireguard-go's receive function goroutines
are running or not. All we can do is model that based on what we have done.
This commit fixes that model.

Fixes #1781

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-23 08:42:04 -07:00
David Anderson
30f5d706a1 net/dns/resolver: remove unnecessary/racy WaitGroup.
Fixes #1663

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-22 19:17:37 -07:00
Brad Fitzpatrick
8a449c4dcd ipn: define NewBackendServer nil as not affecting Backend's NotifyCallback
Updates tailscale/corp#1646

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-22 15:56:54 -07:00
David Anderson
30629c430a cmd/tailscale/cli: don't force an interactive login on --reset.
Fixes #1778

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-22 15:53:50 -07:00
David Anderson
36d030cc36 ipn/ipnlocal: use fallback default DNS whenever exit nodes are on.
Fixes #1625

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-22 15:24:18 -07:00
David Anderson
67ba6aa9fd cmd/tailscale/cli: fix typo in ExitNodeID mapping.
Prevented turning off exit nodes.

Fixes #1777

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-22 14:55:29 -07:00
Brad Fitzpatrick
86e85d8934 ipn/ipnlocal: add peerapi goroutine fetch
Between owners.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-22 13:11:51 -07:00
Josh Bleecher Snyder
5835a3f553 health, wgengine/magicsock: avoid receive function false positives
Avery reported a sub-ms health transition from "receiveIPv4 not running" to "ok".

To avoid these transient false-positives, be more precise about
the expected lifetime of receive funcs. The problematic case is one in which
they were started but exited prior to a call to connBind.Close.
Explicitly represent started vs running state, taking care with the order of updates.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-22 12:48:10 -07:00
Brad Fitzpatrick
3411bb959a control/controlclient: fix signRegisterRequest log suppression check on Windows
Fixes #1774

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-22 11:59:19 -07:00
Brad Fitzpatrick
2d786821f6 ipn/ipnlocal: put a retry loop around Windows file deletes
oh, Windows.

Updates tailscale/corp#1626

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-22 10:04:50 -07:00
Brad Fitzpatrick
11780a4503 cmd/tailscale: only send file basename in push
Fixes #1640

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-22 09:33:59 -07:00
Josh Bleecher Snyder
f845aae761 health: track whether magicsock receive functions are running
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-22 08:57:36 -07:00
Brad Fitzpatrick
529ef98b2a ipn/ipnlocal: fix approxSize operator precedence
Whoops.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-22 08:44:50 -07:00
Brad Fitzpatrick
820952daba cmd/tailscale: don't print out old authURL on up --force-reauth
Fixes #1671

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-22 08:38:07 -07:00
Brad Fitzpatrick
12b4672add wgengine: quiet connection failure diagnostics for exit nodes
The connection failure diagnostic code was never updated enough for
exit nodes, so disable its misleading output when the node it picks
(incorrectly) to diagnose is only an exit node.

Fixes #1754

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-22 08:29:20 -07:00
Brad Fitzpatrick
b03c23d2ed ipn/ipnlocal: log on DeleteFile error
Updates tailscale/corp#1626

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-22 07:48:18 -07:00
Brad Fitzpatrick
6f52fa02a3 control/controlclient, tailcfg: add Debug.SleepSeconds (mapver 19)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-21 22:05:41 -07:00
Brad Fitzpatrick
c91a22c82e cmd/tailscale: don't print auth URL when using a --authkey
Fixes #1755

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-21 21:59:00 -07:00
Brad Fitzpatrick
e40e5429c2 cmd/tailscale/cli: make 'tailscale up' protect --advertise-exit-node removal
The new "tailscale up" checks previously didn't protect against
--advertise-exit-node being omitted in the case that
--advertise-routes was also provided. It wasn't done before because
there is no corresponding pref for "--advertise-exit-node"; it's a
helper flag that augments --advertise-routes. But that's an
implementation detail and we can still help users. We just have to
special case that pref and look whether the current routes include
both the v4 and v6 /0 routes.

Fixes #1767

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-21 21:45:30 -07:00
Brad Fitzpatrick
a16eb6ac41 cmd/tailscale/cli: show online/offline status in push --file-targets
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-21 16:06:01 -07:00
Brad Fitzpatrick
dedbd483ea cmd/tailscale/cli: don't require explicit --operator if it matches $USER
This doesn't make --operator implicit (which we might do in the
future), but it at least doesn't require repeating it in the future
when it already matches $USER.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-21 15:49:01 -07:00
Brad Fitzpatrick
2f17a34242 ipn/ipnlocal: fix tailscale status --json AuthURL field
It was getting cleared on notify.

Document that authURL is cleared on notify and add a new field that
isn't, using the new field for the JSON status.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-21 13:42:26 -07:00
Brad Fitzpatrick
09891b9868 ipn/ipnlocal: on fresh lazy-connecting install, start in state NeedsLogin
Fixes #1759

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-21 13:25:31 -07:00
Josh Bleecher Snyder
a29b0cf55f wgengine/wglog: allow wireguard-go receive routines to log
I've spent two days searching for a theoretical wireguard-go bug
around receive functions exiting early.

I've found many bugs, but none of the flavor we're looking for.

Restore wireguard-go's logging around starting and stopping receive functions,
so that we can definitively rule in or out this particular theory.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-21 12:29:28 -07:00
Josh Bleecher Snyder
eb2a9d4ce3 wgengine/netstack: log error when acceptUDP fails
I see a bunch of these in some logs I'm looking at,
separated only by a few seconds.
Log the error so we can tell what's going on here.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-21 12:25:01 -07:00
Naman Sood
4a90a91d29
wgengine/netstack: log ForwarderRequest in readable form, only in debug mode (#1758)
* wgengine/netstack: log ForwarderRequest in readable form, only in debug mode

Fixes #1757

Signed-off-by: Naman Sood <mail@nsood.in>
2021-04-21 14:50:48 -04:00
Josh Bleecher Snyder
07c95a0219 wgengine/wgcfg/nmcfg: consolidate exit node log lines
These were getting rate-limited for nodes with many peers.
Consolate the output into single lines, which are nicer anyway.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-21 11:29:30 -07:00
Brad Fitzpatrick
3d4d97601a derp/derpmap: add São Paulo (derp11)
Updates #1499
2021-04-21 11:04:47 -07:00
Brad Fitzpatrick
91c9c33036 cmd/tailscaled: don't block ipnserver startup behind engine init on Windows
With this change, the ipnserver's safesocket.Listen (the localhost
tcp.Listen) happens right away, before any synchronous
TUN/DNS/Engine/etc setup work, which might be slow, especially on
early boot on Windows.

Because the safesocket.Listen starts up early, that means localhost
TCP dials (the safesocket.Connect from the GUI) complete successfully
and thus the GUI avoids the MessageBox error. (I verified that
pacifies it, even without a Listener.Accept; I'd feared that Windows
localhost was maybe special and avoided the normal listener backlog).

Once the GUI can then connect immediately without errors, the various
timeouts then matter less, because the backend is no longer trying to
race against the GUI's timeout. So keep retrying on errors for a
minute, or 10 minutes if the system just booted in the past 10
minutes.

This should fix the problem with Windows 10 desktops auto-logging in
and starting the Tailscale frontend which was then showing a
MessageBox error about failing to connect to tailscaled, which was
slow coming up because the Windows networking stack wasn't up
yet. Fingers crossed.

Fixes #1313 (previously #1187, etc)

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-20 22:26:27 -07:00
Alex Brainman
7d8f082ff7 .github/workflows: add --race tests on Linux and Windows
Updates #50
Updates #833

Signed-off-by: Alex Brainman <alex.brainman@gmail.com>
2021-04-20 21:50:26 -07:00
Alex Brainman
7689213aaa cmd/tailscaled: add subcommands to install and remove tailscaled Windows service
This change implements Windows version of install-system-daemon and
uninstall-system-daemon subcommands. When running the commands the
user will install or remove Tailscale Windows service.

Updates #1232

Signed-off-by: Alex Brainman <alex.brainman@gmail.com>
2021-04-20 21:40:59 -07:00
David Anderson
6fd9e28bd0 ipn/ipnlocal: add arpa suffixes to MagicDNS for reverse lookups.
This used to not be necessary, because MagicDNS always did full proxying.
But with split DNS, we need to know which names to route to our resolver,
otherwise reverse lookups break.

This captures the entire CGNAT range, as well as our Tailscale ULA.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-20 18:05:17 -07:00
David Anderson
89c81c26c5 net/dns: fix resolved match domains when no nameservers are provided.
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-20 17:10:39 -07:00
David Anderson
4be26b269f net/dns: correctly capture all traffic in non-split configs.
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-20 16:57:46 -07:00
David Anderson
ca283ac899 net/dns: remove config in openresolv when given an empty DNS config.
Part of #1720.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-20 16:19:34 -07:00
David Anderson
48d4f14652 ipn/ipnlocal: only set authoritative domains when using MagicDNS.
Otherwise, the existence of authoritative domains forces full
DNS proxying even when no other DNS config is present.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-20 15:52:19 -07:00
David Anderson
53213114ec net/dns: make debian_resolvconf correctly clear DNS configs.
More of #1720.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-20 15:51:14 -07:00
David Anderson
3b1ab78954 net/dns: restore resolv.conf when given an empty config in directManager.
Fixes #1720.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-20 15:14:40 -07:00
Brad Fitzpatrick
f99e63bb17 ipn: don't Logout when Windows GUI disconnects
Logout used to be a no-op, so the ipnserver previously synthensized a Logout
on disconnect. Now that Logout actually invalidates the node key that was
forcing all GUI closes to log people out.

Instead, add a method to LocalBackend to specifically mean "the
Windows GUI closed, please forget all the state".

Fixes tailscale/corp#1591 (ignoring the notification issues, tracked elsewhere)

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-20 13:14:10 -07:00