Rather than using a string everywhere and needing to clarify that the
string should have the svc: prefix, create a separate type for Service
names.
Updates tailscale/corp#24607
Change-Id: I720e022f61a7221644bb60955b72cacf42f59960
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
We previously baked in the LetsEncrypt x509 root CA for our tlsdial
package.
This moves that out into a new "bakedroots" package and is now also
shared by ipn/ipnlocal's cert validation code (validCertPEM) that
decides whether it's time to fetch a new cert.
Otherwise, a machine without LetsEncrypt roots locally in its system
roots is unable to use tailscale cert/serve and fetch certs.
Fixes#14690
Change-Id: Ic88b3bdaabe25d56b9ff07ada56a27e3f11d7159
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
cmd/k8s-operator: add logic to parse L7 Ingresses in HA mode
- Wrap the Tailscale API client used by the Kubernetes Operator
into a client that knows how to manage VIPServices.
- Create/Delete VIPServices and update serve config for L7 Ingresses
for ProxyGroup.
- Ensure that ingress ProxyGroup proxies mount serve config from a shared ConfigMap.
Updates tailscale/corp#24795
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Most users should not run into this because it's set in the helm chart
and the deploy manifest, but if namespace is not set we get confusing
authz errors because the kube client tries to fetch some namespaced resources
as though they're cluster-scoped and reports permission denied. Try to
detect namespace from the default projected volume, and otherwise fatal.
Fixes #cleanup
Change-Id: I64b34191e440b61204b9ad30bbfa117abbbe09c3
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
I moved the actual rename into separate, GOOS-specific files. On
non-Windows, we do a simple os.Rename. On Windows, we first try
ReplaceFile with a fallback to os.Rename if the target file does
not exist.
ReplaceFile is the recommended way to rename the file in this use case,
as it preserves attributes and ACLs set on the target file.
Updates #14428
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
This finishes the work started in #14616.
Updates #8632
Change-Id: I4dc07d45b1e00c3db32217c03b21b8b1ec19e782
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
sync.OnceValue and slices.Compact were both added in Go 1.21.
cmp.Or was added in Go 1.22.
Updates #8632
Updates #11058
Change-Id: I89ba4c404f40188e1f8a9566c8aaa049be377754
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
cmd/{k8s-operator,containerboot}: reload tailscaled configfile when its contents have changed
Instead of restarting the Kubernetes Operator proxies each time
tailscaled config has changed, this dynamically reloads the configfile
using the new reload endpoint.
Older annotation based mechanism will be supported till 1.84
to ensure that proxy versions prior to 1.80 keep working with
operator 1.80 and newer.
Updates tailscale/tailscale#13032
Updates tailscale/corp#24795
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
* cmd/k8s-operator,k8s-operator: allow users to set custom labels for the optional ServiceMonitor
Updates tailscale/tailscale#14381
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Currently this does not yet do anything apart from creating
the ProxyGroup resources like StatefulSet.
Updates tailscale/corp#24795
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
These erroneously blocked a recent PR, which I fixed by simply
re-running CI. But we might as well fix them anyway.
These are mostly `printf` to `print` and a couple of `!=` to `!Equal()`
Updates #cleanup
Signed-off-by: Will Norris <will@tailscale.com>
The go-httpstat package has a data race when used with connections that
are performing happy-eyeballs connection setups as we are in the DERP
client. There is a long-stale PR upstream to address this, however
revisiting the purpose of this code suggests we don't really need
httpstat here.
The code populates a latency table that may be used to compare to STUN
latency, which is a lightweight RTT check. Switching out the reported
timing here to simply the request HTTP request RTT avoids the
problematic package.
Fixestailscale/corp#25095
Signed-off-by: James Tucker <james@tailscale.com>
This is the start of an integration/e2e test suite for the tailscale operator.
It currently only tests two major features, ingress proxy and API server proxy,
but we intend to expand it to cover more features over time. It also only
supports manual runs for now. We intend to integrate it into CI checks in a
separate update when we have planned how to securely provide CI with the secrets
required for connecting to a test tailnet.
Updates #12622
Change-Id: I31e464bb49719348b62a563790f2bc2ba165a11b
Co-authored-by: Irbe Krumina <irbe@tailscale.com>
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Every so often, the ProxyGroup and other controllers lose an optimistic locking race
with other controllers that update the objects they create. Stop treating
this as an error event, and instead just log an info level log line for it.
Fixes#14072
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
cmd/containerboot,kube/kubetypes,cmd/k8s-operator: detect if Ingress is created in a tailnet that has no HTTPS
This attempts to make Kubernetes Operator L7 Ingress setup failures more explicit:
- the Ingress resource now only advertises HTTPS endpoint via status.ingress.loadBalancer.hostname when/if the proxy has succesfully loaded serve config
- the proxy attempts to catch cases where HTTPS is disabled for the tailnet and logs a warning
Updates tailscale/tailscale#12079
Updates tailscale/tailscale#10407
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
cmd/k8s-operator/deploy/chart: allow reading OAuth creds from a CSI driver's volume and annotating operator's Service account
Updates #14264
Signed-off-by: Oliver Rahner <o.rahner@dke-data.com>
When the operator enables metrics on a proxy, it uses the port 9001,
and in the near future it will start using 9002 for the debug endpoint
as well. Make sure we don't choose ports from a range that includes
9001 so that we never clash. Setting TS_SOCKS5_SERVER, TS_HEALTHCHECK_ADDR_PORT,
TS_OUTBOUND_HTTP_PROXY_LISTEN, and PORT could also open arbitrary ports,
so we will need to document that users should not choose ports from the
10000-11000 range for those settings.
Updates #13406
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
* cmd/k8s-operator,k8s-operator,go.mod: optionally create ServiceMonitor
Adds a new spec.metrics.serviceMonitor field to ProxyClass.
If that's set to true (and metrics are enabled), the operator
will create a Prometheus ServiceMonitor for each proxy to which
the ProxyClass applies.
Additionally, create a metrics Service for each proxy that has
metrics enabled.
Updates tailscale/tailscale#11292
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
We were previously relying on unintended behaviour by runc where
all containers where by default given read/write/mknod permissions
for tun devices.
This behaviour was removed in https://github.com/opencontainers/runc/pull/3468
and released in runc 1.2.
Containerd container runtime, used by Docker and majority of Kubernetes distributions
bumped runc to 1.2 in 1.7.24 https://github.com/containerd/containerd/releases/tag/v1.7.24
thus breaking our reference tun mode Tailscale Kubernetes manifests and Kubernetes
operator proxies.
This PR changes the all Kubernetes container configs that run Tailscale in tun mode
to privileged. This should not be a breaking change because all these containers would
run in a Pod that already has a privileged init container.
Updates tailscale/tailscale#14256
Updates tailscale/tailscale#10814
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
* cmd/containerboot: serve health on local endpoint
We introduced stable (user) metrics in #14035, and `TS_LOCAL_ADDR_PORT`
with it. Rather than requiring users to specify a new addr/port
combination for each new local endpoint they want the container to
serve, this combines the health check endpoint onto the local addr/port
used by metrics if `TS_ENABLE_HEALTH_CHECK` is used instead of
`TS_HEALTHCHECK_ADDR_PORT`.
`TS_LOCAL_ADDR_PORT` now defaults to binding to all interfaces on 9002
so that it works more seamlessly and with less configuration in
environments other than Kubernetes, where the operator always overrides
the default anyway. In particular, listening on localhost would not be
accessible from outside the container, and many scripted container
environments do not know the IP address of the container before it's
started. Listening on all interfaces allows users to just set one env
var (`TS_ENABLE_METRICS` or `TS_ENABLE_HEALTH_CHECK`) to get a fully
functioning local endpoint they can query from outside the container.
Updates #14035, #12898
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Ensure that the ExternalName Service port names are always synced to the
ClusterIP Service, to fix a bug where if users created a Service with
a single unnamed port and later changed to 1+ named ports, the operator
attempted to apply an invalid multi-port Service with an unnamed port.
Also, fixes a small internal issue where not-yet Service status conditons
were lost on a spec update.
Updates tailscale/tailscale#10102
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
containerboot:
Adds 3 new environment variables for containerboot, `TS_LOCAL_ADDR_PORT` (default
`"${POD_IP}:9002"`), `TS_METRICS_ENABLED` (default `false`), and `TS_DEBUG_ADDR_PORT`
(default `""`), to configure metrics and debug endpoints. In a follow-up PR, the
health check endpoint will be updated to use the `TS_LOCAL_ADDR_PORT` if
`TS_HEALTHCHECK_ADDR_PORT` hasn't been set.
Users previously only had access to internal debug metrics (which are unstable
and not recommended) via passing the `--debug` flag to tailscaled, but can now
set `TS_METRICS_ENABLED=true` to expose the stable metrics documented at
https://tailscale.com/kb/1482/client-metrics at `/metrics` on the addr/port
specified by `TS_LOCAL_ADDR_PORT`.
Users can also now configure a debug endpoint more directly via the
`TS_DEBUG_ADDR_PORT` environment variable. This is not recommended for production
use, but exposes an internal set of debug metrics and pprof endpoints.
operator:
The `ProxyClass` CRD's `.spec.metrics.enable` field now enables serving the
stable user metrics documented at https://tailscale.com/kb/1482/client-metrics
at `/metrics` on the same "metrics" container port that debug metrics were
previously served on. To smooth the transition for anyone relying on the way the
operator previously consumed this field, we also _temporarily_ serve tailscaled's
internal debug metrics on the same `/debug/metrics` path as before, until 1.82.0
when debug metrics will be turned off by default even if `.spec.metrics.enable`
is set. At that point, anyone who wishes to continue using the internal debug
metrics (not recommended) will need to set the new `ProxyClass` field
`.spec.statefulSet.pod.tailscaleContainer.debug.enable`.
Users who wish to opt out of the transitional behaviour, where enabling
`.spec.metrics.enable` also enables debug metrics, can set
`.spec.statefulSet.pod.tailscaleContainer.debug.enable` to false (recommended).
Separately but related, the operator will no longer specify a host port for the
"metrics" container port definition. This caused scheduling conflicts when k8s
needs to schedule more than one proxy per node, and was not necessary for allowing
the pod's port to be exposed to prometheus scrapers.
Updates #11292
---------
Co-authored-by: Kristoffer Dalby <kristoffer@tailscale.com>
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
A small follow-up to #14112- ensures that the operator itself can emit
Events for its kube state store changes.
Updates tailscale/tailscale#14080
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
This is a follow-up to #14112 where our internal kube client was updated
to allow it to emit Events - this updates our sample kube manifests
and tsrecorder manifest templates so they can benefit from this functionality.
Updates tailscale/tailscale#14080
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Adds functionality to kube client to emit Events.
Updates kube store to emit Events when tailscaled state has been loaded, updated or if any errors where
encountered during those operations.
This should help in cases where an error related to state loading/updating caused the Pod to crash in a loop-
unlike logs of the originally failed container instance, Events associated with the Pod will still be
accessible even after N restarts.
Updates tailscale/tailscale#14080
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
We currently annotate pods with a hash of the tailscaled config so that
we can trigger pod restarts whenever it changes. However, the hash
updates more frequently than is necessary causing more restarts than is
necessary. This commit removes two causes; scaling up/down and removing
the auth key after pods have initially authed to control. However, note
that pods will still restart on scale-up/down because of the updated set
of volumes mounted into each pod. Hopefully we can fix that in a planned
follow-up PR.
Updates #13406
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Or unless the new "ts_debug_websockets" build tag is set.
Updates #1278
Change-Id: Ic4c4f81c1924250efd025b055585faec37a5491d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Otherwise all the clients only using control/controlhttp for the
ts2021 HTTP client were also pulling in WebSocket libraries, as the
server side always needs to speak websockets, but only GOOS=js clients
speak it.
This doesn't yet totally remove the websocket dependency on Linux because
Linux has a envknob opt-in to act like GOOS=js for manual testing and force
the use of WebSockets for DERP only (not control). We can put that behind
a build tag in a future change to eliminate the dep on all GOOSes.
Updates #1278
Change-Id: I4f60508f4cad52bf8c8943c8851ecee506b7ebc9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Sets a custom hostinfo app type for ProxyGroup replicas, similarly
to how we do it for all other Kubernetes Operator managed components.
Updates tailscale/tailscale#13406,tailscale/corp#22920
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
This adds a new generic result type (motivated by golang/go#70084) to
try it out, and uses it in the new lineutil package (replacing the old
lineread package), changing that package to return iterators:
sometimes over []byte (when the input is all in memory), but sometimes
iterators over results of []byte, if errors might happen at runtime.
Updates #12912
Updates golang/go#70084
Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In this PR, we add the tailscale syspolicy command with two subcommands: list, which displays
policy settings, and reload, which forces a reload of those settings. We also update the LocalAPI
and LocalClient to facilitate these additions.
Updates #12687
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Now when we have HA for egress proxies, it makes sense to support topology
spread constraints that would allow users to define more complex
topologies of how proxy Pods need to be deployed in relation with other
Pods/across regions etc.
Updates tailscale/tailscale#13406
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
CI / race-root-integration (3/4) (push) Waiting to run
CI / race-root-integration (4/4) (push) Waiting to run
CI / test (-coverprofile=/tmp/coverage.out, amd64) (push) Waiting to run
CI / cross (arm, 7, linux) (push) Waiting to run
CI / test (-race, amd64, 1/3) (push) Waiting to run
CI / test (-race, amd64, 2/3) (push) Waiting to run
CI / test (-race, amd64, 3/3) (push) Waiting to run
CI / test (386) (push) Waiting to run
CI / windows (push) Waiting to run
CI / privileged (push) Waiting to run
CI / ios (push) Waiting to run
CI / vm (push) Waiting to run
CI / race-build (push) Waiting to run
CI / fuzz (push) Waiting to run
CI / depaware (push) Waiting to run
CI / go_generate (push) Waiting to run
CI / cross (amd64, darwin) (push) Waiting to run
CI / cross (amd64, freebsd) (push) Waiting to run
CI / cross (amd64, openbsd) (push) Waiting to run
CI / cross (amd64, windows) (push) Waiting to run
CI / cross (arm64, darwin) (push) Waiting to run
CI / cross (arm64, linux) (push) Waiting to run
CI / cross (arm64, windows) (push) Waiting to run
CI / cross (loong64, linux) (push) Waiting to run
CI / crossmin (amd64, plan9) (push) Waiting to run
CI / crossmin (ppc64, aix) (push) Waiting to run
CI / android (push) Waiting to run
CI / wasm (push) Waiting to run
CI / tailscale_go (push) Waiting to run
CI / go_mod_tidy (push) Waiting to run
In this PR, we update the syspolicy package to utilize syspolicy/rsop under the hood,
and remove syspolicy.CachingHandler, syspolicy.windowsHandler and related code
which is no longer used.
We mark the syspolicy.Handler interface and RegisterHandler/SetHandlerForTest functions
as deprecated, but keep them temporarily until they are no longer used in other repos.
We also update the package to register setting definitions for all existing policy settings
and to register the Registry-based, Windows-specific policy stores when running on Windows.
Finally, we update existing internal and external tests to use the new API and add a few more
tests and benchmarks.
Updates #12687
Signed-off-by: Nick Khyl <nickk@tailscale.com>
It had bit-rotted likely during the transition to vector io in
76389d8baf942b10a8f0f4201b7c4b0737a0172c. Tested on Ubuntu 24.04
by creating a netns and doing the DHCP dance to get an IP.
Updates #2589
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Updates tailscale/tailscale#13839
Adds a new blockblame package which can detect common MITM SSL certificates used by network appliances. We use this in `tlsdial` to display a dedicated health warning when we cannot connect to control, and a network appliance MITM attack is detected.
Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
Adds logic to `checkExitNodePrefsLocked` to return an error when
attempting to use exit nodes on a platform where this is not supported.
This mirrors logic that was added to error out when trying to use `ssh`
on an unsupported platform, and has very similar semantics.
Fixes https://github.com/tailscale/tailscale/issues/13724
Signed-off-by: Mario Minardi <mario@tailscale.com>
This helps better distinguish what is generating activity to the
Tailscale public API.
Updates tailscale/corp#23838
Signed-off-by: Percy Wegmann <percy@tailscale.com>
cmd/k8s-operator,k8s-operator/apis: set a readiness condition on egress Services
Set a readiness condition on ExternalName Services that define a tailnet target
to route cluster traffic to via a ProxyGroup's proxies. The condition
is set to true if at least one proxy is currently set up to route.
Updates tailscale/tailscale#13406
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
We don't need to error out and continuously reconcile if ProxyClass
has not (yet) been created, once it gets created the ProxyGroup
reconciler will get triggered.
Updates tailscale/tailscale#13406
Signed-off-by: Irbe Krumina <irbe@tailscale.com>