We change our invocations of GetExtendedTcpTable to request additional
information about the "module" responsible for the port. In addition to pid,
this output also includes sufficient metadata to enable Windows to resolve
process names and disambiguate svchost processes.
We store the OS-specific output in an OSMetadata field in netstat.Entry, which
portlist may then use as necessary to actually resolve the process/module name.
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
It's long & distracting for how low value it is.
Fixes#6766
Change-Id: I51364f25c0088d9e63deb9f692ba44031f12251b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In some configurations, user explicitly do not want to store
tailscale state in k8s secrets, because doing that leads to
some annoying permission issues with sidecar containers.
With this change, TS_KUBE_SECRET="" and TS_STATE_DIR=/foo
will force storage to file when running in kubernetes.
Fixes#6704.
Signed-off-by: David Anderson <danderson@tailscale.com>
The Tailscale logging service has a hard limit on the maximum
log message size that can be accepted.
We want to ensure that netlog messages never exceed
this limit otherwise a client cannot transmit logs.
Move the goroutine for periodically dumping netlog messages
from wgengine/netlog to net/connstats.
This allows net/connstats to manage when it dumps messages,
either based on time or by size.
Updates tailscale/corp#8427
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
As backup plan, just in case the earlier fix's logic wasn't correct
and we want to experiment in the field or have users have a quicker
fix.
Updates #5285
Change-Id: I7447466374d11f8f609de6dfbc4d9a944770826d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This ensures that we capture error returned by `Serve` and exit with a
non-zero exit code if it happens.
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
The operator creates a fair bit of internal cluster state to manage proxying,
dumping it all in the default namespace is handy for development but rude
for production.
Updates #502
Signed-off-by: David Anderson <danderson@tailscale.com>
Consider the following pattern:
err1 := foo()
err2 := bar()
err3 := baz()
return multierr.New(err1, err2, err3)
If err1, err2, and err3 are all nil, then multierr.New should not allocate.
Thus, modify the logic of New to count the number of distinct error values
and allocate the exactly needed slice. This also speeds up non-empty error
situation since repeatedly growing with append is slow.
Performance:
name old time/op new time/op delta
Empty-24 41.8ns ± 2% 6.4ns ± 1% -84.73% (p=0.000 n=10+10)
NonEmpty-24 120ns ± 3% 69ns ± 1% -42.01% (p=0.000 n=9+10)
name old alloc/op new alloc/op delta
Empty-24 64.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10)
NonEmpty-24 168B ± 0% 88B ± 0% -47.62% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Empty-24 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
NonEmpty-24 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10)
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
We used to need to do timed requeues in a few places in the reconcile logic,
and the easiest way to do that was to plumb reconcile.Result return values
around. But now we're purely event-driven, so the only thing we care about
is whether or not an error occurred.
Incidentally also fix a very minor bug where headless services would get
completely ignored, rather than reconciled into the correct state. This
shouldn't matter in practice because you can't transition from a headful
to a headless service without a deletion, but for consistency let's avoid
having a path that takes no definite action if a service of interest does
exist.
Updates #502.
Signed-off-by: David Anderson <danderson@tailscale.com>
Previously, we had to do blind timed requeues while waiting for
the tailscale hostname, because we looked up the hostname through
the API. But now the proxy container image writes back its hostname
to the k8s secret, so we get an event-triggered reconcile automatically
when the time is right.
Updates #502
Signed-off-by: David Anderson <danderson@tailscale.com>
As is convention in the k8s world, use zap for structured logging. For
development, OPERATOR_LOGGING=dev switches to a more human-readable output
than JSON.
Updates #502
Signed-off-by: David Anderson <danderson@tailscale.com>
Our reconcile loop gets triggered again when the StatefulSet object
finally disappears (in addition to when its deletion starts, as indicated
by DeletionTimestamp != 0). So, we don't need to queue additional
reconciliations to proceed with the remainder of the cleanup, that
happens organically.
Signed-off-by: David Anderson <danderson@tailscale.com>
Previously, if a DNS-over-TCP message was received while there were
existing queries in-flight, and it was over the size limit, we'd close
the 'responses' channel. This would cause those in-flight queries to
send on the closed channel and panic.
Instead, don't close the channel at all and rely on s.ctx being
canceled, which will ensure that in-flight queries don't hang.
Fixes#6725
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I8267728ac37ed7ae38ddd09ce2633a5824320097
It's possible for the 'somethingChanged' callback to be registered and
then trigger before the ctx field is assigned; move the assignment
earlier so this can't happen.
Change-Id: Ia7ee8b937299014a083ab40adf31a8b3e0db4ec5
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Tests cover configuring a proxy through an annotation rather than a
LoadBalancerClass, and converting between those two modes at runtime.
Updates #502.
Signed-off-by: David Anderson <danderson@tailscale.com>
For other test cases, the operator is going to produce similar generated
objects in several codepaths, and those objects are large. Move them out
to helpers so that the main test code stays a bit more intelligible.
The top-level Service that we start and end with remains in the main test
body, because its shape at the start and end is one of the main things that
varies a lot between test cases.
Updates #502.
Signed-off-by: David Anderson <danderson@tailscale.com>
The test verifies one of the successful reconcile paths, where
a client requests an exposed service via a LoadBalancer class.
Updates #502.
Signed-off-by: David Anderson <danderson@tailscale.com>
Also introduces an intermediary interface for the tailscale client, in
preparation for operator tests that fake out the Tailscale API interaction.
Updates #502.
Signed-off-by: David Anderson <danderson@tailscale.com>
Use multierr.Range to iterate through an error tree
instead of multiple invocations of errors.As.
This scales better as we add more Go error types to the switch.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Errors in Go are no longer viewed as a linear chain, but a tree.
See golang/go#53435.
Add a Range function that iterates through an error
in a pre-order, depth-first order.
This matches the iteration order of errors.As in Go 1.20.
This adds the logic (but currently commented out) for having
Error implement the multi-error version of Unwrap in Go 1.20.
It is commented out currently since it causes "go vet"
to complain about having the "wrong" signature.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
This was initially developed in a separate repo, but for build/release
reasons and because go module management limits the damage of importing
k8s things now, moving it into this repo.
At time of commit, the operator enables exposing services over tailscale,
with the 'tailscale' loadBalancerClass. It also currently requires an
unreleased feature to access the Tailscale API, so is not usable yet.
Updates #502.
Signed-off-by: David Anderson <danderson@tailscale.com>
Mainly motivated by wanting to know how much Taildrop is used, but
also useful when tracking down how many invalid requests are
generated.
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
We've been doing a hard kill of the subprocess, which is only safe as long as
both the cli and gui are not running and the subprocess has had the opportunity
to clean up DNS settings etc. If unattended mode is turned on, this is definitely
unsafe.
I changed babysitProc to close the subprocess's stdin to make it shut down, and
then I plumbed a cancel function into the stdin reader on the subprocess side.
Fixes https://github.com/tailscale/corp/issues/5621
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
This is temporary while we work to upstream performance work in
https://github.com/WireGuard/wireguard-go/pull/64. A replace directive
is less ideal as it breaks dependent code without duplication of the
directive.
Signed-off-by: Jordan Whited <jordan@tailscale.com>