Concurrent calls to LocalBackend.setWgengineStatus
could result in some of the status updates being dropped.
This was exacerbated by 92077ae78c,
which increases the probability of concurrent status updates,
causing test failures (tailscale/corp#2579).
It's going to take a bit of work to fix this test.
The ipnlocal state machine is difficult to reason about,
particularly in the face of concurrency.
We could fix the test trivially by throwing a new mutex around
setWgengineStatus to serialize calls to it,
but I'd like to at least try to do better than cosmetics.
In the meantime, commit the test.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
In prep for other bug fixes & tests. It's hard to test when it was
intermingled into LocalBackend.authReconfig.
Now it's a pure function.
And rename variable 'uc' (user config?) to the since idiomatic
'prefs'.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We currently plumb full URLs for DNS resolvers from the control server
down to the client. But when we pass the values into the net/dns
package, we throw away any URL that isn't a bare IP. This commit
continues the plumbing, and gets the URL all the way to the built in
forwarder. (It stops before plumbing URLs into the OS configurations
that can handle them.)
For #2596
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
* Revert "Revert "types/key: add MachinePrivate and MachinePublic.""
This reverts commit 61c3b98a24.
Signed-off-by: David Anderson <danderson@tailscale.com>
* types/key: add ControlPrivate, with custom serialization.
ControlPrivate is just a MachinePrivate that serializes differently
in JSON, to be compatible with how the Tailscale control plane
historically serialized its private key.
Signed-off-by: David Anderson <danderson@tailscale.com>
Plumb throughout the codebase as a replacement for the mixed use of
tailcfg.MachineKey and wgkey.Private/Public.
Signed-off-by: David Anderson <danderson@tailscale.com>
And add health check errors to ipnstate.Status (tailscale status --json).
Updates #2746
Updates #2775
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The number of packet filters can grow very large,
so this log entry can be very large.
We can get the packet filter server-side,
so reduce verbosity here to just the number of filters present.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Now that we have the easier-to-parse go:build build tags,
it is straightforward to simplify them. Yay.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The fact that Hash returns a [sha256.Size]byte leaks details about
the underlying hash implementation. This could very well be any other
hashing algorithm with a possible different block size.
Abstract this implementation detail away by declaring an opaque type
that is comparable. While we are changing the signature of UpdateHash,
rename it to just Update to reduce stutter (e.g., deephash.Update).
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
With this, I can now:
* install Tailscale
* stop the GUI
* net stop Tailscale
* net start Tailscale
* tailscale up --unattended
(where the middle three steps simulate what would happen on a Windows
Server Core machine without a GUI)
Fixes#2137
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Regression from 6d10655dc3, which added
UpdatePrefs but didn't write it out to disk.
I'd planned on adding tests to state_test.go which is why I'd earlier
added 46896a9311 to prepare for making
such persistence tests easier to write, but turns out state_test.go
didn't even test UpdatePrefs, so I'm staying out of there.
Instead, this is tested using integration tests.
Fixes#2321
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We can't access b.netMap without holding b.mu.
We already grabbed it earlier in the function with the lock held.
Introduced in Nov 2020 in 7ea809897d.
Discovered during stress testing.
Apparently it's a pretty rare?
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
After allowing for custom DERP maps, it's convenient to be able to see their latency in
netcheck. This adds a query to the local tailscaled for the current DERPMap.
Updates #1264
Signed-off-by: julianknodt <julianknodt@gmail.com>
We were crashing on in initPeerAPIListener when called from
authReconfig when b.netMap is nil. But authReconfig already returns
before the call to initPeerAPIListener when b.netMap is nil, but it
releases the b.mu mutex before calling initPeerAPIListener which
reacquires it and assumes it's still nil.
The only thing that can be setting it to nil is setNetMapLocked, which
is called by ResetForClientDisconnect, Logout/logout, or Start, all of
which can happen during an authReconfig.
So be more defensive.
Fixes#1996
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We used to use "redo" for that, but it was pretty vague.
Also, fix the build tags broken in interfaces_default_route_test.go from
a9745a0b68, moving those Linux-specific
tests to interfaces_linux_test.go.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The resulting empty Prefs had AllowSingleHosts=false and
Routeall=false, so that on iOS if you did these steps:
- Login and leave running
- Terminate the frontend
- Restart the frontend (fast path restart, missing prefs)
- Set WantRunning=false
- Set WantRunning=true
...then you would have Tailscale running, but with no routes. You would
also accidentally disable the ExitNodeID/IP prefs (symptom: the current
exit node setting didn't appear in the UI), but since nothing
else worked either, you probably didn't notice.
The fix was easy enough. It turns out we already knew about the
problem, so this also fixes one of the BUG entries in state_test.
Fixes: #1918 (BUG-1) and some as-yet-unreported bugs with exit nodes.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
Previously, there was no server round trip required to log out, so when
you asked ipnlocal to Logout(), it could clear the netmap immediately
and switch to NeedsLogin state.
In v1.8, we added a true Logout operation. ipn.Logout() would trigger
an async cc.StartLogout() and *also* immediately switch to NeedsLogin.
Unfortunately, some frontends would see NeedsLogin and immediately
trigger a new StartInteractiveLogin() operation, before the
controlclient auth state machine actually acted on the Logout command,
thus accidentally invalidating the entire logout operation, retaining
the netmap, and violating the user's expectations.
Instead, add a new LogoutFinished signal from controlclient
(paralleling LoginFinished) and, upon starting a logout, don't update
the ipn state machine until it's received.
Updates: #1918 (BUG-2)
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
On clean installs we didn't set use iptables, but during upgrades it
looks like we could use old prefs that directed us to go into the iptables
paths that might fail on Synology.
Updates #1995Fixestailscale/tailscale-synology#57 (I think)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This leads to a cleaner separation of intent vs. implementation
(Routes is now the only place specifying who handles DNS requests),
and allows for cleaner expression of a configuration that creates
MagicDNS records without serving them to the OS.
Signed-off-by: David Anderson <danderson@tailscale.com>
This code path is very tricky since it was originally designed for the
"re-authenticate to refresh my keys" use case, which didn't want to
lose the original session even if the refresh cycle failed. This is why
it acts differently from the Logout(); Login(); case.
Maybe that's too fancy, considering that it probably never quite worked
at all, for switching between users without logging out first. But it
works now.
This was more invasive than I hoped, but the necessary fixes actually
removed several other suspicious BUG: lines from state_test.go, so I'm
pretty confident this is a significant net improvement.
Fixestailscale/corp#1756.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
If the engine was shutting down from a previous session
(e.closing=true), it would return an error code when trying to get
status. In that case, ipnlocal would never unblock any callers that
were waiting on the status.
Not sure if this ever happened in real life, but I accidentally
triggered it while writing a test.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
Yes, it printed, but that was an implementation detail for hashing.
And coming optimization will make it print even less.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Needed for the "up checker" to map back from exit node stable IDs (the
ipn.Prefs.ExitNodeID) back to an IP address in error messages.
But also previously requested so people can use it to then make API
calls. The upcoming "tailscale admin" subcommand will probably need it
too.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This is needed because the original opts.Prefs field was at some point
subverted for use in frontend->backend state migration for backward
compatibility on some platforms. We still need that feature, but we
also need the feature of providing the full set of prefs from
`tailscale up`, *not* including overwriting the prefs.Persist keys, so
we can't use the original field from `tailscale up`.
`tailscale up` had attempted to compensate for that by doing SetPrefs()
before Start(), but that violates the ipn.Backend contract, which says
you should call Start() before anything else (that's why it's called
Start()). As a result, doing SetPrefs({ControlURL=...,
WantRunning=true}) would cause a connection to the *previous* control
server (because WantRunning=true), and then connect to the *new*
control server only after running Start().
This problem may have been avoided before, but only by pure luck.
It turned out to be relatively harmless since the connection to the old
control server was immediately closed and replaced anyway, but it
created a race condition that could have caused spurious notifications
or rejected keys if the server responded quickly.
As already covered by existing TODOs, a better fix would be to have
Start() get out of the business of state migration altogether. But
we're approaching a release so I want to make the minimum possible fix.
Fixes#1840.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
Per discussion, we want to have only one test assertion library,
and we want to start by exploring quicktest.
This was a mostly mechanical translation.
I think we could make this nicer by defining a few helper
closures at the beginning of the test. Later.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
This removes the NewLocalBackendWithClientGen constructor added in
b4d04a065f and instead adds
LocalBackend.SetControlClientGetterForTesting, mirroring
LocalBackend.SetHTTPTestClient. NewLocalBackendWithClientGen was
weird in being exported but taking an unexported type. This was noted
during code review:
https://github.com/tailscale/tailscale/pull/1818#discussion_r623155669
which ended in:
"I'll leave it for y'all to clean up if you find some way to do it elegantly."
This is more idiomatic.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Without this, macOS would fail to display its menu state correctly if you
started it while !WantRunning. It relies on the netmap in order to show
the logged-in username.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
There was logic that would make a "down" tailscale backend (ie.
!WantRunning) refuse to do any network activity. Unfortunately, this
makes the macOS and iOS UI unable to render correctly if they start
while !WantRunning.
Now that we have Prefs.LoggedOut, use that instead. So `tailscale down`
will still allow the controlclient to connect its authroutine, but
pause the maproutine. `tailscale logout` will entirely stop all
activity.
This new behaviour is not obviously correct; it's a bit annoying that
`tailsale down` doesn't terminate all activity like you might expect.
Maybe we should redesign the UI code to render differently when
disconnected, and then revert this change.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
EditPrefs should be just a wrapper around the action of changing prefs,
but someone had added a side effect of calling Login() sometimes. The
side effect happened *after* running the state machine, which would
sometimes result in us going into NeedsLogin immediately before calling
cc.Login().
This manifested as the macOS app not being able to Connect if you
launched it with LoggedOut=false and WantRunning=false. Trying to
Connect() would sent us to the NeedsLogin state instead.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
A very long unit test that verifies the way the controlclient and
ipn.Backend interact.
This is a giant sequential test of the state machine. The test passes,
but only because it's asserting all the wrong behaviour. I marked all
the behaviour I think is wrong with BUG comments, and several
additional test opportunities with TODO.
Note: the new test supercedes TestStartsInNeedsLoginState, which was
checking for incorrect behaviour (although the new test still checks
for the same incorrect behaviour) and assumed .Start() would converge
before returning, which it happens to do, but only for this very
specific case, for the current implementation. You're supposed to wait
for the notifications.
Updates: tailscale/corp#1660
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
With this change, shared node names resolve correctly on split DNS-supporting
operating systems.
Fixestailscale/corp#1706
Signed-off-by: David Anderson <danderson@tailscale.com>
The intention was always that files only get written to *.partial
files and renamed at the end once fully received, but somewhere in the
process that got lost in buffered mode and *.partial files were only
being used in direct receive mode. This fix prevents WaitingFiles
from returning files that are still being transferred.
Updates tailscale/corp#1626
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
If DeleteFile fails on Windows due to another process (anti-virus,
probably) having our file open, instead leave a marker file that the
file is logically deleted, and remove it from API calls and clean it
up lazily later.
Updates tailscale/corp#1626
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
It was getting cleared on notify.
Document that authURL is cleared on notify and add a new field that
isn't, using the new field for the JSON status.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This used to not be necessary, because MagicDNS always did full proxying.
But with split DNS, we need to know which names to route to our resolver,
otherwise reverse lookups break.
This captures the entire CGNAT range, as well as our Tailscale ULA.
Signed-off-by: David Anderson <danderson@tailscale.com>
Otherwise, the existence of authoritative domains forces full
DNS proxying even when no other DNS config is present.
Signed-off-by: David Anderson <danderson@tailscale.com>
Logout used to be a no-op, so the ipnserver previously synthensized a Logout
on disconnect. Now that Logout actually invalidates the node key that was
forcing all GUI closes to log people out.
Instead, add a method to LocalBackend to specifically mean "the
Windows GUI closed, please forget all the state".
Fixestailscale/corp#1591 (ignoring the notification issues, tracked elsewhere)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>