Commit Graph

3520 Commits

Author SHA1 Message Date
Josh Bleecher Snyder
fe50ded95c health: track whether we have a functional udp4 bind
Suggested-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 11:07:14 -07:00
Josh Bleecher Snyder
7dc7078d96 wgengine/magicsock: use netaddr.IP in listenPacket
It must be an IP address; enforce that at the type level.

Suggested-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 11:07:14 -07:00
Brad Fitzpatrick
4bf6939ee0 ipn/ipnlocal: remove t.Parallel from recently added test
The test modifies a global; it shouldn't be parallel.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-28 11:02:56 -07:00
Josh Bleecher Snyder
3c543c103a wgengine/magicsock: unify initial bind and rebind
We had two separate code paths for the initial UDP listener bind
and any subsequent rebinds.

IPv6 got left out of the rebind code.
Rather than duplicate it there, unify the two code paths.
Then improve the resulting code:

* Rebind had nested listen attempts to try the user-specified port first,
  and then fall back to :0 if that failed. Convert that into a loop.
* Initial bind tried only the user-specified port.
  Rebind tried the user-specified port and 0.
  But there are actually three ports of interest:
  The one the user specified, the most recent port in use, and 0.
  We now try all three in order, as appropriate.
* In the extremely rare case in which binding to port 0 fails,
  use a dummy net.PacketConn whose reads block until close.
  This will keep the wireguard-go receive func goroutine alive.

As a pleasant side-effect of this, if we decide that
we need to resuscitate #1796, it will now be much easier.

Fixes #1799

Co-authored-by: David Anderson <danderson@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 10:39:28 -07:00
Josh Bleecher Snyder
8fb66e20a4 wgengine/magicsock: remove DefaultPort const
Assume it'll stay at 0 forever, so hard-code it
and delete code conditional on it being non-0.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 10:39:28 -07:00
Josh Bleecher Snyder
a8f61969b9 wgengine/magicsock: remove context arg from listenPacket
It was set to context.Background by all callers, for the same reasons.
Set it locally instead, to simplify call sites.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 10:39:28 -07:00
Brad Fitzpatrick
a48c8991f1 ipn/ipnlocal: add a test for earlier lazy machine key generation change
Updates #1573

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-28 08:52:04 -07:00
Brad Fitzpatrick
1e6d512bf0 cmd/tailscale: improve file cp error message in macOS GUI version
Fixes tailscale/corp#1684

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-28 08:35:55 -07:00
Brad Fitzpatrick
4512aad889 version: add IsSandboxedMacOS func
For when we need to tweak behavior or errors as a function of which of
3 macOS Tailscale variants we're using. (more accessors coming later
as needed)

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-28 08:34:19 -07:00
Brad Fitzpatrick
8efc7834f2 go.mod: bump wireguard-go
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-28 08:04:26 -07:00
David Anderson
306a094d4b ipn/ipnlocal: remove IPv6 records from MagicDNS.
Fixes #1813.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-28 01:01:56 -07:00
Brad Fitzpatrick
2840afabba version: bump date
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-27 19:10:25 -07:00
David Anderson
44c2b7dc79 net/dns: on windows, skip site-local v6 resolvers.
Further refinement for tailscale/corp#1662.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-27 18:24:54 -07:00
Brad Fitzpatrick
8554694616 cmd/tailscale: add 'tailscale file get' subcommand
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-27 15:28:50 -07:00
Brad Fitzpatrick
cafa037de0 cmd/tailscale/cli: rename 'tailscale push' to 'tailscale file cp'
And reverse order, require final colon, and support multiple files.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-27 13:58:56 -07:00
Brad Fitzpatrick
bb2141e0cf wgengine: periodically poll engine status for logging side effect
Fixes tailscale/corp#1560

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-27 13:55:47 -07:00
Brad Fitzpatrick
3c9dea85e6 wgengine: update a log line from 'weird' to conventional 'unexpected'
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-27 09:59:25 -07:00
Brad Fitzpatrick
3bdc9e9cb2 ipn/ipnlocal: prevent a now-expected [unexpected] log message on Windows
Updates #1620

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-27 09:58:05 -07:00
Ross Zurowski
b062ac5e86
cmd/tailscale: fix typo in error message (#1807)
Signed-off-by: Ross Zurowski <ross@rosszurowski.com>
2021-04-27 10:16:08 -04:00
Brad Fitzpatrick
5ecc7c7200 cmd/tailscale: make the new 'up' errors prettier and more helpful
Fixes #1746

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-26 21:26:29 -07:00
Josh Bleecher Snyder
744de615f1 health, wgenegine: fix receive func health checks for the fourth time
The old implementation knew too much about how wireguard-go worked.
As a result, it missed genuine problems that occurred due to unrelated bugs.

This fourth attempt to fix the health checks takes a black box approach.
A receive func is healthy if one (or both) of these conditions holds:

* It is currently running and blocked.
* It has been executed recently.

The second condition is required because receive functions
are not continuously executing. wireguard-go calls them and then
processes their results before calling them again.

There is a theoretical false positive if wireguard-go go takes
longer than one minute to process the results of a receive func execution.
If that happens, we have other problems.

Updates #1790

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-26 17:35:49 -07:00
Josh Bleecher Snyder
0d4c8cb2e1 health: delete ReceiveFunc health checks
They were not doing their job.
They need yet another conceptual re-think.
Start by clearing the decks.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-26 17:35:49 -07:00
Josh Bleecher Snyder
99705aa6b7 net/tstun: split TUN events channel into up/down and MTU
We had a long-standing bug in which our TUN events channel
was being received from simultaneously in two places.

The first is wireguard-go.

At wgengine/userspace.go:366, we pass e.tundev to wireguard-go,
which starts a goroutine (RoutineTUNEventReader)
that receives from that channel and uses events to adjust the MTU
and bring the device up/down.

At wgengine/userspace.go:374, we launch a goroutine that
receives from e.tundev, logs MTU changes, and triggers
state updates when up/down changes occur.

Events were getting delivered haphazardly between the two of them.

We don't really want wireguard-go to receive the up/down events;
we control the state of the device explicitly by calling device.Up.
And the userspace.go loop MTU logging duplicates logging that
wireguard-go does when it received MTU updates.

So this change splits the single TUN events channel into up/down
and other (aka MTU), and sends them to the parties that ought
to receive them.

I'm actually a bit surprised that this hasn't caused more visible trouble.
If a down event went to wireguard-go but the subsequent up event
went to userspace.go, we could end up with the wireguard-go device disappearing.

I believe that this may also (somewhat accidentally) be a fix for #1790.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-26 17:16:51 -07:00
David Anderson
97d2fa2f56 net/dns: work around WSL DNS implementation flaws.
Fixes tailscale/corp#1662

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-26 16:54:50 -07:00
Brad Fitzpatrick
ffe6c8e335 cmd/tailscale/cli: don't do a simple up when in state NeedsLogin
Fixes #1780

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-26 11:38:56 -07:00
Brad Fitzpatrick
138921ae40 ipn/ipnlocal: always write files to partial files, even in buffered mode
The intention was always that files only get written to *.partial
files and renamed at the end once fully received, but somewhere in the
process that got lost in buffered mode and *.partial files were only
being used in direct receive mode. This fix prevents WaitingFiles
from returning files that are still being transferred.

Updates tailscale/corp#1626

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-26 11:34:05 -07:00
Brad Fitzpatrick
5e268e6153 ipn/ipnlocal: use delete marker files to work around Windows delete problems
If DeleteFile fails on Windows due to another process (anti-virus,
probably) having our file open, instead leave a marker file that the
file is logically deleted, and remove it from API calls and clean it
up lazily later.

Updates tailscale/corp#1626

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-26 10:59:25 -07:00
Avery Pennarun
a7fe1d7c46 wgengine/bench: improved rate selection.
The old decay-based one took a while to converge. This new one (based
very loosely on TCP BBR) seems to converge quickly on what seems to be
the best speed.

Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2021-04-26 03:51:13 -04:00
Avery Pennarun
a92b9647c5 wgengine/bench: speed test for channels, sockets, and wireguard-go.
This tries to generate traffic at a rate that will saturate the
receiver, without overdoing it, even in the event of packet loss. It's
unrealistically more aggressive than TCP (which will back off quickly
in case of packet loss) but less silly than a blind test that just
generates packets as fast as it can (which can cause all the CPU to be
absorbed by the transmitter, giving an incorrect impression of how much
capacity the total system has).

Initial indications are that a syscall about every 10 packets (TCP bulk
delivery) is roughly the same speed as sending every packet through a
channel. A syscall per packet is about 5x-10x slower than that.

The whole tailscale wireguard-go + magicsock + packet filter
combination is about 4x slower again, which is better than I thought
we'd do, but probably has room for improvement.

Note that in "full" tailscale, there is also a tundev read/write for
every packet, effectively doubling the syscall overhead per packet.

Given these numbers, it seems like read/write syscalls are only 25-40%
of the total CPU time used in tailscale proper, so we do have
significant non-syscall optimization work to do too.

Sample output:

$ GOMAXPROCS=2 go test -bench . -benchtime 5s ./cmd/tailbench
goos: linux
goarch: amd64
pkg: tailscale.com/cmd/tailbench
cpu: Intel(R) Core(TM) i7-4785T CPU @ 2.20GHz
BenchmarkTrivialNoAlloc/32-2         	56340248	        93.85 ns/op	 340.98 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTrivialNoAlloc/124-2        	57527490	        99.27 ns/op	1249.10 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTrivialNoAlloc/1024-2       	52537773	       111.3 ns/op	9200.39 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTrivial/32-2                	41878063	       135.6 ns/op	 236.04 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTrivial/124-2               	41270439	       138.4 ns/op	 896.02 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTrivial/1024-2              	36337252	       154.3 ns/op	6635.30 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkBlockingChannel/32-2           12171654	       494.3 ns/op	  64.74 MB/s	         0 %lost	    1791 B/op	       0 allocs/op
BenchmarkBlockingChannel/124-2          12149956	       507.8 ns/op	 244.17 MB/s	         0 %lost	    1792 B/op	       1 allocs/op
BenchmarkBlockingChannel/1024-2         11034754	       528.8 ns/op	1936.42 MB/s	         0 %lost	    1792 B/op	       1 allocs/op
BenchmarkNonlockingChannel/32-2          8960622	      2195 ns/op	  14.58 MB/s	         8.825 %lost	    1792 B/op	       1 allocs/op
BenchmarkNonlockingChannel/124-2         3014614	      2224 ns/op	  55.75 MB/s	        11.18 %lost	    1792 B/op	       1 allocs/op
BenchmarkNonlockingChannel/1024-2        3234915	      1688 ns/op	 606.53 MB/s	         3.765 %lost	    1792 B/op	       1 allocs/op
BenchmarkDoubleChannel/32-2          	 8457559	       764.1 ns/op	  41.88 MB/s	         5.945 %lost	    1792 B/op	       1 allocs/op
BenchmarkDoubleChannel/124-2         	 5497726	      1030 ns/op	 120.38 MB/s	        12.14 %lost	    1792 B/op	       1 allocs/op
BenchmarkDoubleChannel/1024-2        	 7985656	      1360 ns/op	 752.86 MB/s	        13.57 %lost	    1792 B/op	       1 allocs/op
BenchmarkUDP/32-2                    	 1652134	      3695 ns/op	   8.66 MB/s	         0 %lost	     176 B/op	       3 allocs/op
BenchmarkUDP/124-2                   	 1621024	      3765 ns/op	  32.94 MB/s	         0 %lost	     176 B/op	       3 allocs/op
BenchmarkUDP/1024-2                  	 1553750	      3825 ns/op	 267.72 MB/s	         0 %lost	     176 B/op	       3 allocs/op
BenchmarkTCP/32-2                    	11056336	       503.2 ns/op	  63.60 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTCP/124-2                   	11074869	       533.7 ns/op	 232.32 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTCP/1024-2                  	 8934968	       671.4 ns/op	1525.20 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkWireGuardTest/32-2          	 1403702	      4547 ns/op	   7.04 MB/s	        14.37 %lost	     467 B/op	       3 allocs/op
BenchmarkWireGuardTest/124-2         	  780645	      7927 ns/op	  15.64 MB/s	         1.537 %lost	     420 B/op	       3 allocs/op
BenchmarkWireGuardTest/1024-2        	  512671	     11791 ns/op	  86.85 MB/s	         0.5206 %lost	     411 B/op	       3 allocs/op
PASS
ok  	tailscale.com/wgengine/bench	195.724s

Updates #414.

Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2021-04-26 03:51:13 -04:00
Maisem Ali
590792915a wgengine/router{win}: ignore broadcast routes added by Windows when removing routes.
Signed-off-by: Maisem Ali <maisem@tailscale.com>
2021-04-24 14:13:35 -07:00
David Anderson
f6b7d08aea net/dns: work around new NetworkManager in other selection paths.
Further bits of #1788

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-23 22:09:00 -07:00
David Anderson
25ce9885a2 net/dns: don't use NM+resolved for NM >=1.26.6.
NetworkManager fixed the bug that forced us to use NetworkManager
if it's programming systemd-resolved, and in the same release also
made NetworkManager ignore DNS settings provided for unmanaged
interfaces... Which breaks what we used to do. So, with versions
1.26.6 and above, we MUST NOT use NetworkManager to indirectly
program systemd-resolved, but thankfully we can talk to resolved
directly and get the right outcome.

Fixes #1788

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-23 21:13:19 -07:00
David Anderson
31f81b782e util/cmpver: move into OSS from corp repo.
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-23 20:55:45 -07:00
Aleksandar Pesic
7c985e4944 ipn/ipnlocal: add file sharing to windows shell
Updates: tailscale/winmin#33

Signed-off-by: Aleksandar Pesic <peske.nis@gmail.com>
2021-04-23 13:32:33 -07:00
Brad Fitzpatrick
e41075dd4a net/interfaces: work around race fetching routing table
Fixes #1345
Updates golang/go#45736

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-23 13:23:19 -07:00
Brad Fitzpatrick
fe53a714bd ipn/ipnlocal: add a LocalBackend.Start fast path if already running
Updates tailscale/corp#1621

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-23 12:14:12 -07:00
Brad Fitzpatrick
ad1a595a75 ipn/ipnlocal: close peer API listeners on transition away from Running
Updates tailscale/corp#1621

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-23 12:13:50 -07:00
Brad Fitzpatrick
d94ed7310b cmd/tailscale/cli: add test for already-submitted #1777
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-23 10:56:26 -07:00
Josh Bleecher Snyder
8d7f7fc7ce health, wgenegine: fix receive func health checks yet again
The existing implementation was completely, embarrassingly conceptually broken.

We aren't able to see whether wireguard-go's receive function goroutines
are running or not. All we can do is model that based on what we have done.
This commit fixes that model.

Fixes #1781

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-23 08:42:04 -07:00
David Anderson
30f5d706a1 net/dns/resolver: remove unnecessary/racy WaitGroup.
Fixes #1663

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-22 19:17:37 -07:00
Brad Fitzpatrick
8a449c4dcd ipn: define NewBackendServer nil as not affecting Backend's NotifyCallback
Updates tailscale/corp#1646

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-22 15:56:54 -07:00
David Anderson
30629c430a cmd/tailscale/cli: don't force an interactive login on --reset.
Fixes #1778

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-22 15:53:50 -07:00
David Anderson
36d030cc36 ipn/ipnlocal: use fallback default DNS whenever exit nodes are on.
Fixes #1625

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-22 15:24:18 -07:00
David Anderson
67ba6aa9fd cmd/tailscale/cli: fix typo in ExitNodeID mapping.
Prevented turning off exit nodes.

Fixes #1777

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-22 14:55:29 -07:00
Brad Fitzpatrick
86e85d8934 ipn/ipnlocal: add peerapi goroutine fetch
Between owners.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-22 13:11:51 -07:00
Josh Bleecher Snyder
5835a3f553 health, wgengine/magicsock: avoid receive function false positives
Avery reported a sub-ms health transition from "receiveIPv4 not running" to "ok".

To avoid these transient false-positives, be more precise about
the expected lifetime of receive funcs. The problematic case is one in which
they were started but exited prior to a call to connBind.Close.
Explicitly represent started vs running state, taking care with the order of updates.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-22 12:48:10 -07:00
Brad Fitzpatrick
3411bb959a control/controlclient: fix signRegisterRequest log suppression check on Windows
Fixes #1774

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-22 11:59:19 -07:00
Brad Fitzpatrick
2d786821f6 ipn/ipnlocal: put a retry loop around Windows file deletes
oh, Windows.

Updates tailscale/corp#1626

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-22 10:04:50 -07:00
Brad Fitzpatrick
11780a4503 cmd/tailscale: only send file basename in push
Fixes #1640

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-22 09:33:59 -07:00
Josh Bleecher Snyder
f845aae761 health: track whether magicsock receive functions are running
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-22 08:57:36 -07:00