This moves most of the health package global variables to a new
`health.Tracker` type.
But then rather than plumbing the Tracker in tsd.System everywhere,
this only goes halfway and makes one new global Tracker
(`health.Global`) that all the existing callers now use.
A future change will eliminate that global.
Updates #11874
Updates #4136
Change-Id: I6ee27e0b2e35f68cb38fecdb3b2dc4c3f2e09d68
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This is a useful primitive for asynchronous execution of ordered work I
want to use in another change.
Updates tailscale/corp#16833
Signed-off-by: James Tucker <james@tailscale.com>
It was a really a mutable field owned by mapSession that we didn't move
in earlier commits.
Once moved, it's then possible to de-func-ify the code and turn it into
a regular method rather than an installed optional hook.
Noticed while working to move map session lifetimes out of
Direct.sendMapRequest's single-HTTP-connection scope.
Updates #7175
Updates #cleanup
Change-Id: I6446b15793953d88d1cabf94b5943bb3ccac3ad9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Currently only the top four most popular changes: endpoints, DERP
home, online, and LastSeen.
Updates #1909
Change-Id: I03152da176b2b95232b56acabfb55dcdfaa16b79
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We would only check if the client was paused, but not
if the client was closed. This meant that a call to
Shutdown may block forever/leak goroutines
Updates #cleanup
Signed-off-by: Maisem Ali <maisem@tailscale.com>
resetControlClientLocked is called while b.mu was held and
would call cc.Shutdown which would wait for the observer queue
to drain.
However, there may be active callbacks from cc already waiting for
b.mu resulting in a deadlock.
This makes it so that resetControlClientLocked does not call
Shutdown, and instead just returns the value.
It also makes it so that any status received from previous cc
are ignored.
Updates tailscale/corp#12827
Signed-off-by: Maisem Ali <maisem@tailscale.com>
We want the overall state (used only for tests) to be computed from
the individual states of each component, rather than moving the state
around by hand in dozens of places.
In working towards that, we found a lot of things to clean up.
Updates #cleanup
Change-Id: Ieaaae5355dfae789a8ec7a56ce212f1d7e3a92db
Co-authored-by: Maisem Ali <maisem@tailscale.com>
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Don't just start goroutines and hope for them to be ordered.
Fixes potential regression from earlier 7074a40c0.
Updates #cleanup
Change-Id: I501a6f3e4e8e6306b958bccdc1e47869991c31f7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We have cases where the SetControlClientStatus would result in
a Shutdown call back into the auto client that would block
forever. The right thing to do here is to fix the LocalBackend
state machine but thats a different dumpster fire that we
are slowly making progress towards.
This makes it so that the SetControlClientStatus happens in a
different goroutine so that calls back into the auto client
do not block.
Also add a few missing mu.Unlocks in LocalBackend.Start.
Updates #9181
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Then use the Locked variants in Shutdown while we already hold the lock.
Updates #cleanup
Change-Id: I367d53e6be6f37f783c8f43fc9c4d498d0adf501
Co-authored-by: Maisem Ali <maisem@tailscale.com>
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Don't depend on the server to do it.
Updates #cleanup
Change-Id: I8ff40b02aa877155a71fd4db58cbecb872241ac8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
They were entirely redundant and 1:1 with the status field
so this turns them into methods instead.
Updates #cleanup
Updates #1909
Change-Id: I7d939750749edf7dae4c97566bbeb99f2f75adbc
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I'm trying to remove some stuff from the netmap update path.
Updates #1909
Change-Id: Iad2c728dda160cd52f33ef9cf0b75b4940e0ce64
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
De-pointer a *time.Time type, move it after the mutex which guard is,
rename two test-only methods with our conventional "ForTest" suffix.
Updates #cleanup
Change-Id: I4f4d1acd9c2de33d9c3cb6465d7349ed051aa9f9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
For now the method has only one interface (the same as the func it's
replacing) but it will grow, eventually with the goal to remove the
controlclient.Status type for most purposes.
Updates #1909
Change-Id: I715c8bf95e3f5943055a94e76af98d988558a2f2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
No need to have it on Auto or be behind a mutex; it's only read/written
from a single goroutine. Move it there.
Updates tailscale/corp#5761
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
See issue. This is a baby step towards passing through deltas
end-to-end from node to control back to node and down to the various
engine subsystems, not computing diffs from two full netmaps at
various levels. This will then let us support larger netmaps without
burning CPU.
But this change itself changes no behavior. It just changes a func
type to an interface with one method. That paves the way for future
changes to then add new NetmapUpdater methods that do more
fine-grained work than updating the whole world.
Updates #1909
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The read of the synced field for logging takes place outside the lock, and
races with other (locked) writes of this field, including for example the one
at current line 556 in mapRoutine.
Updates tailscale/corp#13856
Change-Id: I056b36d7a93025aafdf73528dd7645f10b791af6
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Instead of having updates replace the map polls, create
a third goroutine which is solely responsible for making
sure that control is aware of the latest client state.
This also makes it so that the streaming map polls are only
broken when there are auth changes, or the client is paused.
Updates tailscale/corp#5761
Signed-off-by: Maisem Ali <maisem@tailscale.com>
We were never resetting the backoff in streaming mapResponses.
The call to `PollNetMap` always returns with an error. Changing that contract
is harder, so manually reset backoff when a netmap is received.
Updates tailscale/corp#12894
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Using log.Printf may end up being printed out to the console, which
is not desirable. I noticed this when I was investigating some client
logs with `sockstats: trace "NetcheckClient" was overwritten by another`.
That turns to be harmless/expected (the netcheck client will fall back
to the DERP client in some cases, which does its own sockstats trace).
However, the log output could be visible to users if running the
`tailscale netcheck` CLI command, which would be needlessly confusing.
Updates tailscale/corp#9230
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
This reverts commit 6eca47b16c and fixes forward.
Previously the first ever streaming MapRequest that a client sent would also
set ReadOnly to true as it didn't have any endpoints and expected/relied on the
map poll to restart as soon as it got endpoints. However with 48f6c1eba4,
we would no longer restart MapRequests as frequently as we used to, so control
would only ever get the first streaming MapRequest which had ReadOnly=true.
Control would treat this as an uninteresting request and would not send it
any further netmaps, while the client would happily stay in the map poll forever
while litemap updates happened in parallel.
This makes it so that we never set `ReadOnly=true` when we are doing a streaming
MapRequest. This is no longer necessary either as most endpoint discovery happens
over disco anyway.
Co-authored-by: Andrew Dunham <andrew@du.nham.ca>
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Prior to this change, if we were in the middle of a lite map update we'd
tear down the entire map session and restart it. With this change, we'll
cancel an in-flight lite map request up to 10 times and restart before
we tear down the streaming map request. We tear down everything after 10
retries to ensure that a steady stream of calls to sendNewMapRequest
doesn't fail to make progress by repeatedly canceling and restarting.
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Co-authored-by: Maisem Ali <maisem@tailscale.com>
Change-Id: I9392bf8cf674e7a58ccd1e476039300a359ef3b1
Makes it cheaper/simpler to persist values, and encourages reuse of
labels as opposed to generating an arbitrary number.
Updates tailscale/corp#9230
Updates #3363
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
Uses the hooks added by tailscale/go#45 to instrument the reads and
writes on the major code paths that do network I/O in the client. The
convention is to use "<package>.<type>:<label>" as the annotation for
the responsible code path.
Enabled on iOS, macOS and Android only, since mobile platforms are the
ones we're most interested in, and we are less sensitive to any
throughput degradation due to the per-I/O callback overhead (macOS is
also enabled for ease of testing during development).
For now just exposed as counters on a /v0/sockstats PeerAPI endpoint.
We also keep track of the current interface so that we can break out
the stats by interface.
Updates tailscale/corp#9230
Updates #3363
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
This updates all source files to use a new standard header for copyright
and license declaration. Notably, copyright no longer includes a date,
and we now use the standard SPDX-License-Identifier header.
This commit was done almost entirely mechanically with perl, and then
some minimal manual fixes.
Updates #6865
Signed-off-by: Will Norris <will@tailscale.com>
* Plumb disablement values through some of the internals of TKA enablement.
* Transmit the node's TKA hash at the end of sync so the control plane understands each node's head.
* Implement /machine/tka/disable RPC to actuate disablement on the control plane.
There is a partner PR for the control server I'll send shortly.
Signed-off-by: Tom DNetto <tom@tailscale.com>
At some point we started restarting map polls on health change, but we
don't remember why. Maybe it was a desperate workaround for something.
I'm not sure it ever worked.
Rather than have a haunted graveyard, remove it.
In its place, though, and somewhat as a safety backup, send those
updates over the HTTP/2 noise channel if we have one open. Then if
there was a reason that a map poll restart would help we could do it
server-side. But mostly we can gather error stats and show
machine-level health info for debugging.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Step 1 of many, cleaning up the direct/auto client & restarting map
requests that leads to all the unnecessary map requests.
Updates tailscale/corp#5761
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Currently, when SetNetInfo is called it sets the value on
hostinfo.NetInfo. However, when SetHostInfo is called it overwrites the
hostinfo field which may mean it also clears out the NetInfo it had just
received.
This commit stores NetInfo separately and combines it into Hostinfo as
needed so that control is always notified of the latest values.
Also, remove unused copies of Hostinfo from ipn.Status and
controlclient.Auto.
Updates #tailscale/corp#4824 (maybe fixes)
Signed-off-by: Maisem Ali <maisem@tailscale.com>