Seeing "frontend-provided legacy machine key" was weird (and not quite
accurate) on Linux machines where it comes from the _daemon key's
persist prefs, not the "frontend".
Make the log message distinguish between the cases.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When the service was running without a client (e.g. after a reboot)
and then the owner logs in and the GUI attaches, the computed state
key changed to "" (driven by frontend prefs), and then it was falling
out of server mode, despite the GUI-provided prefs still saying it
wanted server mode.
Also add some logging. And remove a scary "Access denied" from a
user-visible error, making the two possible already-in-use error
messages consistent with each other.
On Windows, we were previously treating a server used by different
users as a fatal error, which meant the second user (upon starting
Tailscale, explicitly or via Start Up programs) got an invasive error
message dialog.
Instead, give it its own IPN state and change the Notify.ErrMessage to
be details in that state. Then the Windows GUI can be less aggresive
about that happening.
Also,
* wait to close the IPN connection until the server ownership state
changes so the GUI doesn't need to repeatedly reconnect to discover
changes.
* fix a bug discovered during testing: on system reboot, the
ipnserver's serverModeUser was getting cleared while the state
transitioned from Unknown to Running. Instead, track 'inServerMode'
explicitly and remove the old accessor method which was error prone.
* fix a rare bug where the client could start up and set the server
mode prefs in its Start call and we wouldn't persist that to the
StateStore storage's prefs start key. (Previously it was only via a
prefs toggle at runtime)
os.IsNotExist doesn't unwrap errors. errors.Is does.
The ioutil.ReadFile ones happened to be fine but I changed them so
we're consistent with the rule: if the error comes from os, you can
use os.IsNotExist, but from any other package, use errors.Is.
(errors.Is always would also work, but not worth updating all the code)
The motivation here was that we were logging about failure to migrate
legacy relay node prefs file on startup, even though the code tried
to avoid that.
See golang/go#41122
It was especially bad on our GUI platforms with a frontend that polls it.
No need to log it every few seconds if it's unchanged. Make it slightly
less allocate-y while I'm here.
If we can't find the mapping from SID ("user ID") -> username, don't
treat that as a fatal. Apparently that happens in the wild for Reasons.
Ignore it for now. It's just a nice-to-have for error messages in the
rare multi-user case.
Updates #869
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When building with redo, also include the git commit hash
from the proprietary repo, so that we have a precise commit
that identifies all build info (including Go toolchain version).
Add a top-level build script demonstrating to downstream distros
how to burn the right information into builds.
Adjust `tailscale version` to print commit hashes when available.
Fixes#841.
Signed-off-by: David Anderson <danderson@tailscale.com>
Also replaces the IPv6Overlay bool with use of DebugFlags, since
it's currently an experimental configuration.
Signed-off-by: David Anderson <danderson@tailscale.com>
We were creating the controlclient and starting the portpoll concurrently,
which frequently resulted in the first controlclient connection being canceled
by the firsdt portpoll result ~milliseconds later, resulting in another
HTTP request.
Instead, wait a bit for the first portpoll result so it's much less likely to
interrupt our controlclient connection.
Updates tailscale/corp#557
This partially (but not yet fully) migrates Windows to tailscaled's
StateStore storage system.
This adds a new bool Pref, ForceDaemon, defined as:
// ForceDaemon specifies whether a platform that normally
// operates in "client mode" (that is, requires an active user
// logged in with the GUI app running) should keep running after the
// GUI ends and/or the user logs out.
//
// The only current applicable platform is Windows. This
// forced Windows to go into "server mode" where Tailscale is
// running even with no users logged in. This might also be
// used for macOS in the future. This setting has no effect
// for Linux/etc, which always operate in daemon mode.
Then, when ForceDaemon becomes true, we now write use the StateStore
to track which user started it in server mode, and store their prefs
under that key.
The ipnserver validates the connections/identities and informs that
LocalBackend which userid is currently in charge.
The GUI can then enable/disable server mode at runtime, without using
the CLI.
But the "tailscale up" CLI was also fixed, so Windows users can use
authkeys or ACL tags, etc.
Updates #275
It was previously possible for two different Windows users to connect
to the IPN server at once, but it didn't really work. They mostly
stepped on each other's toes and caused chaos.
Now only one can control it, but it can be active for everybody else.
Necessary dependency step for Windows server/headless mode (#275)
While here, finish wiring up the HTTP status page on Windows, now that
all the dependent pieces are available.
If no interfaces are up, calm down and stop spamming so much. It was
noticed as especially bad on Windows, but probably was bad
everywhere. I just have the best network conditions testing on a
Windows VM.
Updates #604
So previous routes aren't shadowing resources that the operating
system might need (Windows Domain Controller, DNS server, corp HTTP
proxy, WinHTTP fetching the PAC file itself, etc).
This effectively detects when we're transitioning from, say, public
wifi to corp wifi and makes Tailscale remove all its routes and stops
its TCP connections and tries connecting to everything anew.
Updates tailscale/corp#653
We depend on DERP for NAT traversal now[0] so disabling it entirely can't
work.
What we'll do instead in the future is let people specify
alternate/additional DERP servers. And perhaps in the future we could
also add a pref for nodes to say when they expect to never need/want
to use DERP for data (but allow it for NAT traversal communication).
But this isn't the right pref and it doesn't work, so delete it.
Fixes#318
[0] https://tailscale.com/blog/how-nat-traversal-works/
It's properly handled later in tsdns.NewMap anyway, but there's work
done in the meantime that can be skipped when a peer lacks a DNS name.
It's also more clear that it's okay for it to be blank.
Also remove rebinding logic from the windows router. Magicsock will
instead rebind based on link change signals.
Signed-off-by: David Anderson <danderson@tailscale.com>
The test's LocalBackend was not shut down (Shutdown both releases
resources and waits for its various goroutines to end). This should
fix the test race we were seeing. It definitely fixes the file
descriptor leak that preventing -race -count=500 from passing before.
Start of making the IPN state machine react to link changes and down
its DNS & routes if necessary to unblock proxy resolution (e.g. for
transitioning from public to corp networks where the corp network has
mandatory proxies and WPAD PAC files that can't be resolved while
using the DNS/routes configured previously)
This change should be a no-op. Just some callback plumbing.
For example:
$ tailscale ping -h
USAGE
ping <hostname-or-IP>
FLAGS
-c 10 max number of pings to send
-stop-once-direct true stop once a direct path is established
-verbose false verbose output
$ tailscale ping mon.ts.tailscale.com
pong from monitoring (100.88.178.64) via DERP(sfo) in 65ms
pong from monitoring (100.88.178.64) via DERP(sfo) in 252ms
pong from monitoring (100.88.178.64) via [2604:a880:2:d1::36:d001]:41641 in 33ms
Fixes#661
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Also, bit of behavior change: on non-nil err but expired context,
don't reset the consecutive failure count. I don't think the old
behavior was intentional.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
tailscaled receives a SIGPIPE when CLIs disconnect from it. We shouldn't
shut down in that case.
This reverts commit 43b271cb26.
Signed-off-by: David Anderson <danderson@tailscale.com>
ORder of operations to trigger a problem:
- Start an already authed tailscaled, verify you can ping stuff.
- Run `tailscale up`. Notice you can no longer ping stuff.
The problem is that `tailscale up` stops the IPN state machine before
restarting it, which zeros out the packet filter but _not_ the packet
filter hash. Then, upon restarting IPN, the uncleared hash incorrectly
makes the code conclude that the filter doesn't need updating, and so
we stay with a zero filter (reject everything) for ever.
The fix is simply to update the filterHash correctly in all cases,
so that running -> stopped -> running correctly changes the filter
at every transition.
Signed-off-by: David Anderson <danderson@tailscale.com>
We need to emit Prefs when it *has* changed, not when it hasn't.
Test is added in our e2e test, separately.
Fixes: #620
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
So a backend in server-an-error state (as used by Windows) can try to
create a new Engine again each time somebody re-connects, relaunching
the GUI app.
(The proper fix is actually fixing Windows issues, but this makes things better
in the short term)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This prevents hostname being forced to os.Hostname despite override
when control is contacted for the first time after starting tailscaled.
Signed-off-by: Dmytro Shynkevych <dmytro@tailscale.com>
The StartLoginInteractive command is for delegating the sign-in flow
to a browser. The Android Gooogle Sign-In SDK inverts the flow by
giving the client ID tokens.
Add a new backend command for accepting such tokens by exposing the existing
controlclient.Client.Login support for OAuth2 tokens. Introduce a custom
TokenType to distinguish ID tokens from other OAuth2 tokens.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
We want the macOS Network Extension to share fate with the UI frontend,
so we need the backend to know when the frontend disappears.
One easy way to do that is to reuse the existing TCP server it's
already running (for tailscale status clietns).
We now tell the frontend our ephemeral TCP port number, and then have
the UI connect to it, so the backend can know when it disappears.
There are likely Swift ways of doing this, but I couldn't find them
quickly enough, so I reached for the hammer I knew.
This change adds to tsdns the ability to delegate lookups to upstream nameservers.
This is crucial for setting Magic DNS as the system resolver.
Signed-off-by: Dmytro Shynkevych <dmytro@tailscale.com>
There's a lot of confusion around what tailscale status shows, so make it better:
show region names, last write time, and put stars around DERP too if active.
Now stars are always present if activity, and always somewhere.
It's just a config wrapper that passes "use less memory at the
expense of compression" parameters by default, so that we don't
accidentally construct resource-hungry (de)compressors.
Also includes a benchmark that measures the memory cost of the
small variants vs. the stock variants. The savings are significant
on both compressors (~8x less memory) and decompressors (~1.4x less,
not including the savings from the significantly smaller
window on the compression side - with those savings included it's
more like ~140x smaller).
BenchmarkSmallEncoder-8 56174 19354 ns/op 31 B/op 0 allocs/op
BenchmarkSmallEncoderWithBuild-8 2900 382940 ns/op 1746547 B/op 36 allocs/op
BenchmarkStockEncoder-8 48921 25761 ns/op 286 B/op 0 allocs/op
BenchmarkStockEncoderWithBuild-8 426 2630241 ns/op 13843842 B/op 124 allocs/op
BenchmarkSmallDecoder-8 123814 9344 ns/op 0 B/op 0 allocs/op
BenchmarkSmallDecoderWithBuild-8 41547 27455 ns/op 27694 B/op 31 allocs/op
BenchmarkStockDecoder-8 129832 9417 ns/op 1 B/op 0 allocs/op
BenchmarkStockDecoderWithBuild-8 25561 51751 ns/op 39607 B/op 92 allocs/op
Signed-off-by: David Anderson <danderson@tailscale.com>
This adds a new magicsock endpoint type only used when both sides
support discovery (that is, are advertising a discovery
key). Otherwise the old code is used.
So far the new code only communicates over DERP as proof that the new
code paths are wired up. None of the actually discovery messaging is
implemented yet.
Support for discovery (generating and advertising a key) are still
behind an environment variable for now.
Updates #483
Later we'll want to use the presence of a discovery key as a signal
that the node knows how to participate in discovery. Currently the
code generates keys and sends them to the control server but doesn't
do anything with them, which is a bad state to stay in lest we release
this code and end up with nodes in the future that look like they're
functional with the new discovery protocol but aren't.
So for now, make this opt-in as a debug option for now, until the rest
of it is in.
Updates #483
The zstd library treats that limit as a hard cap on decompressed
size, in the mode we're using it, rather than a window size.
Signed-off-by: David Anderson <danderson@tailscale.com>
Overriding the hostname is required for Android, where os.Hostname
is often just "localhost".
Updates #409
Signed-off-by: Elias Naur <mail@eliasnaur.com>
This removes the use of suppress_ifgroup and fwmark "x/y" notation,
which are, among other things, not available in busybox and centos6.
We also use the return codes from the 'ip' program instead of trying to
parse its output.
I also had to remove the previous hack that routed all of 100.64.0.0/10
by default, because that would add the /10 route into the 'main' route
table instead of the new table 88, which is no good. It was a terrible
hack anyway; if we wanted to capture that route, we should have
captured it explicitly as a subnet route, not as part of the addr. Note
however that this change affects all platforms, so hopefully there
won't be any surprises elsewhere.
Fixes#405
Updates #320, #144
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
Instead of hard-coding the DERP map (except for cmd/tailscale netcheck
for now), get it from the control server at runtime.
And make the DERP map support multiple nodes per region with clients
picking the first one that's available. (The server will balance the
order presented to clients for load balancing)
This deletes the stunner package, merging it into the netcheck package
instead, to minimize all the config hooks that would've been
required.
Also fix some test flakes & races.
Fixes#387 (Don't hard-code the DERP map)
Updates #388 (Add DERP region support)
Fixes#399 (wgengine: flaky tests)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The compressed blobs we send back and forth are small and infrequent,
which doesn't justify the 8MB * GOMAXPROCS memory that was being
allocated. This was the overwhelming majority of memory use in
tailscaled. On my system it goes from ~100M RSS to ~15M RSS (which is
still suspiciously high, but we can worry about that more later).
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
Also stop logging data sent/received from nodes we're not connected to (ie all those `x`s being logged in the `peers: ` line)
Signed-off-by: Wendi <wendi.yu@yahoo.ca>
If a test calls log.Printf, 'go test' horrifyingly rearranges the
output to no longer be in chronological order, which makes debugging
virtually impossible. Let's stop that from happening by making
log.Printf panic if called from any module, no matter how deep, during
tests.
This required us to change the default error handler in at least one
http.Server, as well as plumbing a bunch of logf functions around,
especially in magicsock and wgengine, but also in logtail and backoff.
To add insult to injury, 'go test' also rearranges the output when a
parent test has multiple sub-tests (all the sub-test's t.Logf is always
printed after all the parent tests t.Logf), so we need to screw around
with a special Logf that can point at the "current" t (current_t.Logf)
in some places. Probably our entire way of using subtests is wrong,
since 'go test' would probably like to run them all in parallel if you
called t.Parallel(), but it definitely can't because the're all
manipulating the shared state created by the parent test. They should
probably all be separate toplevel tests instead, with common
setup/teardown logic. But that's a job for another time.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>