The hijacker on k8s-proxy's reverse proxy is used to stream recordings
to tsrecorder as they pass through the proxy to the kubernetes api
server. The connection to the recorder was using the client's
(e.g., kubectl) context, rather than a dedicated one. This was causing
the recording stream to get cut off in scenarios where the client
cancelled the context before streaming could be completed.
By using a dedicated context, we can continue streaming even if the
client cancels the context (for example if the client request
completes).
Fixes#17404
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
Originally proposed by @bradfitz in #17413.
In practice, a lot of subscribers have only one event type of interest, or a
small number of mostly independent ones. In that case, the overhead of running
and maintaining a goroutine to select on multiple channels winds up being more
noisy than we'd like for the user of the API.
For this common case, add a new SubscriberFunc[T] type that delivers events to
a callback owned by the subscriber, directly on the goroutine belonging to the
client itself. This frees the consumer from the need to maintain their own
goroutine to pull events from the channel, and to watch for closure of the
subscriber.
Before:
s := eventbus.Subscribe[T](eventClient)
go func() {
for {
select {
case <-s.Done():
return
case e := <-s.Events():
doSomethingWith(e)
}
}
}()
// ...
s.Close()
After:
func doSomethingWithT(e T) { ... }
s := eventbus.SubscribeFunc(eventClient, doSomethingWithT)
// ...
s.Close()
Moreover, unless the caller wants to explicitly stop the subscriber separately
from its governing client, it need not capture the SubscriberFunc value at all.
One downside of this approach is that a slow or deadlocked callback could block
client's service routine and thus stall all other subscriptions on that client,
However, this can already happen more broadly if a subscriber fails to service
its delivery channel in a timely manner, it just feeds back more immediately.
Updates #17487
Change-Id: I64592d786005177aa9fd445c263178ed415784d5
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Since #17376, containerboot crashes on startup in k8s because state
encryption is enabled by default without first checking that it's
compatible with the selected state store. Make sure we only default
state encryption to enabled if it's not going to immediately clash with
other bits of tailscaled config.
Updates tailscale/corp#32909
Change-Id: I76c586772750d6da188cc97b647c6e0c1a8734f0
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Saves ~94 KB from the min build.
Updates #12614
Change-Id: I3b0b8a47f80b9fd3b1038c2834b60afa55bf02c2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Part of making all netlink monitoring code optional.
Updates #17311 (how I got started down this path)
Updates #12614
Change-Id: Ic80d8a7a44dc261c4b8678b3c2241c3b3778370d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Also pull out interface method only needed in Linux.
Instead of having userspace do the call into the router, just let the
router pick up the change itself.
Updates #15160
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Before we introduced seamless, the "blocked" state was used to track:
* Whether a login was required for connectivity, and therefore we should
keep the engine deconfigured until that happened
* Whether authentication was in progress
"blocked" would stop authReconfig from running. We want this when a login is
required: if your key has expired we want to deconfigure the engine and keep
it down, so that you don't keep using exit nodes (which won't work because
your key has expired).
Taking the engine down while auth was in progress was undesirable, so we
don't do that with seamless renewal. However, not entering the "blocked"
state meant that we needed to change the logic for when to send
LoginFinished on the IPN bus after seeing StateAuthenticated from the
controlclient. Initially we changed the "if blocked" check to "if blocked or
seamless is enabled" which was correct in other places.
In this place however, it introduced a bug: we are sending LoginFinished
every time we see StateAuthenticated, which happens even on a down & up, or
a profile switch. This in turn made it harder for UI clients to track when
authentication is complete.
Instead we should only send it out if we were blocked (i.e. seamless is
disabled, or our key expired) or an auth was in progress.
Updates tailscale/corp#31476
Updates tailscale/corp#32645
Fixes#17363
Signed-off-by: James Sanderson <jsanderson@tailscale.com>
Saves 45 KB from the min build, no longer pulling in deephash or
util/hashx, both with unsafe code.
It can actually be more efficient to not use deephash, as you don't
have to walk all bytes of all fields recursively to answer that two
things are not equal. Instead, you can just return false at the first
difference you see. And then with views (as we use ~everywhere
nowadays), the cloning the old value isn't expensive, as it's just a
pointer under the hood.
Updates #12614
Change-Id: I7b08616b8a09b3ade454bb5e0ac5672086fe8aec
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Historically, and until recently, --extra-small produced a usable build.
When I recently made osrouter be modular in 39e35379d4 (which is
useful in, say, tsnet builds) after also making netstack modular, that
meant --min now lacked both netstack support for routing and system
support for routing, making no way to get packets into
wireguard. That's not a nice default to users. (we've documented
build_dist.sh in our KB)
Restore --extra-small to making a usable build, and add --min for
benchmarking purposes.
Updates #12614
Change-Id: I649e41e324a36a0ca94953229c9914046b5dc497
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Some of the test cases access fields of the backend that are supposed to be
locked while the test is running, which can trigger the race detector. I fixed
a few of these in #17411, but I missed these two cases.
Updates #15160
Updates #17192
Change-Id: I45664d5e34320ecdccd2844e0f8b228145aaf603
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Saves ~53 KB from the min build.
Updates #12614
Change-Id: I73f9544a9feea06027c6ebdd222d712ada851299
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add subscribers for AppConnector events
Make the RouteAdvertiser interface optional We cannot yet remove it because
the tests still depend on it to verify correctness. We will need to separately
update the test fixtures to remove that dependency.
Publish RouteInfo via the event bus, so we do not need a callback to do that.
Replace it with a flag that indicates whether to treat the route info the connector
has as "definitive" for filtering purposes.
Update the tests to simplify the construction of AppConnector values now that a
store callback is no longer required. Also fix a couple of pre-existing racy tests that
were hidden by not being concurrent in the same way production is.
Updates #15160
Updates #17192
Change-Id: Id39525c0f02184e88feaf0d8a3c05504850e47ee
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
If we received a wg engine status while processing an auth URL, there was a
race condition where the authURL could be reset to "" immediately after we
set it.
To fix this we need to check that we are moving from a non-Running state to
a Running state rather than always resetting the URL when we "move" into a
Running state even if that is the current state.
We also need to make sure that we do not return from stopEngineAndWait until
the engine is stopped: before, we would return as soon as we received any
engine status update, but that might have been an update already in-flight
before we asked the engine to stop. Now we wait until we see an update that
is indicative of a stopped engine, or we see that the engine is unblocked
again, which indicates that the engine stopped and then started again while
we were waiting before we checked the state.
Updates #17388
Signed-off-by: James Sanderson <jsanderson@tailscale.com>
Co-authored-by: Nick Khyl <nickk@tailscale.com>
Saves ~102 KB from the min build.
Updates #12614
Change-Id: Ie1d4f439321267b9f98046593cb289ee3c4d6249
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Due to iOS memory limitations in 2020 (see
https://tailscale.com/blog/go-linker, etc) and wireguard-go using
multiple goroutines per peer, commit 16a9cfe2f4 introduced some
convoluted pathsways through Tailscale to look at packets before
they're delivered to wireguard-go and lazily reconfigure wireguard on
the fly before delivering a packet, only telling wireguard about peers
that are active.
We eventually want to remove that code and integrate wireguard-go's
configuration with Tailscale's existing netmap tracking.
To make it easier to find that code later, this makes it modular. It
saves 12 KB (of disk) to turn it off (at the expense of lots of RAM),
but that's not really the point. The point is rather making it obvious
(via the new constants) where this code even is.
Updates #12614
Change-Id: I113b040f3e35f7d861c457eaa710d35f47cee1cb
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Explain that this file stays forked from coder/websocket until we can
depend on an upstream release for the helper.
Updates #cleanup
Signed-off-by: kscooo <kscowork@gmail.com>
Switching to a Geneve-encapsulated (peer relay) path in
endpoint.handlePongConnLocked is expected around port rebinds, which end
up clearing endpoint.bestAddr.
Fixestailscale/corp#33036
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Saves only 12 KB, but notably removes some deps on packages that future
changes can then eliminate entirely.
Updates #12614
Change-Id: Ibf830d3ee08f621d0a2011b1d4cd175427ef50df
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
c2n was already a conditional feature, but it didn't have a
feature/c2n directory before (rather, it was using consts + DCE). This
adds it, and moves some code, which removes the httprec dependency.
Also, remove some unnecessary code from our httprec fork.
Updates #12614
Change-Id: I2fbe538e09794c517038e35a694a363312c426a2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
As found by @cmol in #17423.
Updates #17423
Change-Id: I1492501f74ca7b57a8c5278ea6cb87a56a4086b9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Saves 86 KB.
And stop depending on expvar and usermetrics when disabled,
in prep to removing all the expvar/metrics/tsweb stuff.
Updates #12614
Change-Id: I35d2479ddd1d39b615bab32b1fa940ae8cbf9b11
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This patch removes some code that didn’t get removed before merging
the changes in #16580.
Updates #cleanup
Updates #16551
Signed-off-by: Simon Law <sfllaw@tailscale.com>
kubestore init function has now been moved to a more explicit path of
ipn/store/kubestore meaning we can now avoid the generic import of
feature/condregister.
Updates #12614
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
When running integration tests on macOS, we get a panic from a nil
pointer dereference when calling `ci.creds.PID()`.
This panic occurs because the `ci.creds != nil` check is insufficient
after a recent refactoring (c45f881) that changed `ci.creds` from a
pointer to the `PeerCreds` interface. Now `ci.creds` always compares as
non-nil, so we enter this block even when the underlying value is nil.
The integration tests fail on macOS when `peercred.Get()` returns the
error `unix.GetsockoptInt: socket is not connected`. This error isn't
new, and the previous code was ignoring it correctly.
Since we trust that `peercred` returns either a usable value or an error,
checking for a nil error is a sufficient and correct gate to prevent the
method call and avoid the panic.
Fixes#17421
Signed-off-by: Alex Chan <alexc@tailscale.com>
In the earlier http2 package migration (1d93bdce20, #17394) I had
removed Direct.Close's tracking of the connPool, thinking it wasn't
necessary.
Some tests (in another repo) are strict and like it to tear down the
world and wait, to check for leaked goroutines. And they caught this
letting some goroutines idle past Close, even if they'd eventually
close down on their own.
This restores the connPool accounting and the aggressife close.
Updates #17305
Updates #17394
Change-Id: I5fed283a179ff7c3e2be104836bbe58b05130cc7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The control plane will sometimes determine that a node is not online,
while the node is still able to connect to its peers. This patch
doesn’t solve this problem, but it does mitigate it.
This PR introduces the `client-side-reachability` node attribute that
switches the node to completely ignore the online signal from control.
In the future, the client itself should collect reachability data from
active Wireguard flows and Tailscale pings.
Updates #17366
Updates tailscale/corp#30379
Updates tailscale/corp#32686
Signed-off-by: Simon Law <sfllaw@tailscale.com>
A recent change (009d702adf) introduced a deadlock where the
/machine/update-health network request to report the client's health
status update to the control plane was moved to being synchronous
within the eventbus's pump machinery.
I started to instead make the health reporting be async, but then we
realized in the three years since we added that, it's barely been used
and doesn't pay for itself, for how many HTTP requests it makes.
Instead, delete it all and replace it with a c2n handler, which
provides much more helpful information.
Fixestailscale/corp#32952
Change-Id: I9e8a5458269ebfdda1c752d7bbb8af2780d71b04
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Saves 262 KB so far. I'm sure I missed some places, but shotizam says
these were the low hanging fruit.
Updates #12614
Change-Id: Ia31c01b454f627e6d0470229aae4e19d615e45e3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Maybe it matters? At least globally across all nodes?
Fixes#17343
Change-Id: I3f61758ea37de527e16602ec1a6e453d913b3195
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>