8839 Commits

Author SHA1 Message Date
Raj Singh
7480d887cc Add Grafana dashboard for Tailscale K8s operator monitoring
This commit adds a Grafana dashboard for monitoring Tailscale health, connectivity, and performance in Kubernetes environments. The dashboard provides visibility into subnet routers, health messages, and network traffic for Tailscale proxies deployed by the Kubernetes operator.

Signed-off-by: Raj Singh <raj@tailscale.com>
2025-03-24 17:27:07 -05:00
Brad Fitzpatrick
14db99241f net/netmon: use Monitor's tsIfName if set by SetTailscaleInterfaceName
Currently nobody calls SetTailscaleInterfaceName yet, so this is a
no-op. I checked oss, android, and the macOS/iOS client. Nobody calls
this, or ever did.

But I want to in the future.

Updates #15408
Updates #9040

Change-Id: I05dfabe505174f9067b929e91c6e0d8bc42628d7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-03-24 13:34:02 -07:00
Brad Fitzpatrick
156cd53e77 net/netmon: unexport GetState
Baby step towards #15408.

Updates #15408

Change-Id: I11fca6e677af2ad2f065d83aa0d83550143bff29
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-03-24 10:43:15 -07:00
Brad Fitzpatrick
5c0e08fbbd tstest/mts: add multiple-tailscaled development tool
To let you easily run multiple tailscaled instances for development
and let you route CLI commands to the right one.

Updates #15145

Change-Id: I06b6a7bf024f341c204f30705b4c3068ac89b1a2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-03-24 10:10:35 -07:00
Brad Fitzpatrick
d0c50c6072 clientupdate: cache CanAutoUpdate, avoid log spam when false
I noticed logs on one of my machines where it can't auto-update with
scary log spam about "failed to apply tailnet-wide default for
auto-updates".

This avoids trying to do the EditPrefs if we know it's just going to
fail anyway.

Updates #282

Change-Id: Ib7db3b122185faa70efe08b60ebd05a6094eed8c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-03-24 09:46:48 -07:00
Simon Law
6bbf98bef4
all: skip looking for package comments in .git/ repository (#15384) 2025-03-21 14:46:02 -07:00
Brad Fitzpatrick
e1078686b3 safesocket: respect context timeout when sleeping for 250ms in retry loop
Noticed while working on a dev tool that uses local.Client.

Updates #cleanup

Change-Id: I981efff74a5cac5f515755913668bd0508a4aa14
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-03-21 10:55:32 -07:00
James Sanderson
c261fb198f tstest: make it clearer where AwaitRunning failed and why
Signed-off-by: James Sanderson <jsanderson@tailscale.com>
2025-03-21 13:09:46 +00:00
James Sanderson
5668de272c tsnet: use test logger for testcontrol and node logs
Updates #cleanup

Signed-off-by: James Sanderson <jsanderson@tailscale.com>
2025-03-21 12:33:36 +00:00
Tom Proctor
005e20a45e
cmd/k8s-operator,internal/client/tailscale: use VIPService annotations for ownership tracking (#15356)
Switch from using the Comment field to a ts-scoped annotation for
tracking which operators are cooperating over ownership of a
VIPService.

Updates tailscale/corp#24795

Change-Id: I72d4a48685f85c0329aa068dc01a1a3c749017bf
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2025-03-21 09:08:39 +00:00
Irbe Krumina
196ae1cd74
cmd/k8s-operator,k8s-operator: allow optionally using LE staging endpoint for Ingress (#15360)
cmd/k8s-operator,k8s-operator: allow using LE staging endpoint for Ingress

Allow to optionally use LetsEncrypt staging endpoint to issue
certs for Ingress/HA Ingress, so that it is easier to
experiment with initial Ingress setup without hiting rate limits.

Updates tailscale/corp#24795


Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2025-03-21 08:53:41 +00:00
Nick Khyl
f3f2f72f96 ipn/ipnlocal: do not attempt to start the auditlogger with a nil transport
(*LocalBackend).setControlClientLocked() is called to both set and reset b.cc.
We shouldn't attempt to start the audit logger when b.cc is being reset (i.e., cc is nil).

However, it's fine to start the audit logger if b.cc implements auditlog.Transport, even if it's not a controlclient.Auto but a mock control client.

In this PR, we fix both issues and add an assertion that controlclient.Auto is an auditlog.Transport. This ensures a compile-time failure if controlclient.Auto ever stops being a valid transport due to future interface or implementation changes.

Updates tailscale/corp#26435

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2025-03-20 15:56:54 -05:00
Nick Khyl
e07c1573f6 ipn/ipnlocal: do not reset the netmap and packet filter in (*LocalBackend).Start()
Resetting LocalBackend's netmap without also unconfiguring wgengine to reset routes, DNS, and the killswitch
firewall rules may cause connectivity issues until a new netmap is received.

In some cases, such as when bootstrap DNS servers are inaccessible due to network restrictions or other reasons,
or if the control plane is experiencing issues, this can result in a complete loss of connectivity until the user disconnects
and reconnects to Tailscale.

As LocalBackend handles state resets in (*LocalBackend).resetForProfileChangeLockedOnEntry(), and this includes
resetting the netmap, resetting the current netmap in (*LocalBackend).Start() is not necessary.
Moreover, it's harmful if (*LocalBackend).Start() is called more than once for the same profile.

In this PR, we update resetForProfileChangeLockedOnEntry() to reset the packet filter and remove
the redundant resetting of the netmap and packet filter from Start(). We also update the state machine
tests and revise comments that became inaccurate due to previous test updates.

Updates tailscale/corp#27173

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2025-03-20 13:18:23 -05:00
Brad Fitzpatrick
984cd1cab0 cmd/tailscale: add CLI debug command to do raw LocalAPI requests
This adds a portable way to do a raw LocalAPI request without worrying
about the Unix-vs-macOS-vs-Windows ways of hitting the LocalAPI server.
(It was already possible but tedious with 'tailscale debug local-creds')

Updates tailscale/corp#24690

Change-Id: I0828ca55edaedf0565c8db192c10f24bebb95f1b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-03-20 10:07:11 -07:00
Irbe Krumina
f34e08e186
ipn: ensure that conffile is source of truth for advertised services. (#15361)
If conffile is used to configure tailscaled, always update
currently advertised services from conffile, even if they
are empty in the conffile, to ensure that it is possible
to transition to a state where no services are advertised.

Updates tailscale/corp#24795

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2025-03-20 14:40:36 +00:00
klyubin
3a2c92f08e
web: support Host 100.100.100.100:80 in tailscaled web server
This makes the web server running inside tailscaled on 100.100.100.100:80 support requests with `Host: 100.100.100.100:80` and its IPv6 equivalent.

Prior to this commit, the web server replied to such requests with a redirect to the node's Tailscale IP:5252.

Fixes https://github.com/tailscale/tailscale/issues/14415

Signed-off-by: Alex Klyubin <klyubin@gmail.com>
2025-03-19 16:46:32 +00:00
Tom Proctor
8d84720edb
cmd/k8s-operator: update ProxyGroup config Secrets instead of patch (#15353)
There was a flaky failure case where renaming a TLS hostname for an
ingress might leave the old hostname dangling in tailscaled config. This
happened when the proxygroup reconciler loop had an outdated resource
version of the config Secret in its cache after the
ingress-pg-reconciler loop had very recently written it to delete the
old hostname. As the proxygroup reconciler then did a patch, there was
no conflict and it reinstated the old hostname.

This commit updates the patch to an update operation so that if the
resource version is out of date it will fail with an optimistic lock
error. It also checks for equality to reduce the likelihood that we make
the update API call in the first place, because most of the time the
proxygroup reconciler is not even making an update to the Secret in the
case that the hostname has changed.

Updates tailscale/corp#24795

Change-Id: Ie23a97440063976c9a8475d24ab18253e1f89050
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2025-03-19 13:49:36 +00:00
Jonathan Nobels
25d5f78c6e
net/dns: expose a function for recompiling the DNS configuration (#15346)
updates tailscale/corp#27145

We require a means to trigger a recompilation of the DNS configuration
to pick up new nameservers for platforms where we blend the interface
nameservers from the OS into our DNS config.

Notably, on Darwin, the only API we have at our disposal will, in rare instances,
return a transient error when querying the interface nameservers on a link change if
they have not been set when we get the AF_ROUTE messages for the link
update.

There's a corresponding change in corp for Darwin clients, to track
the interface namservers during NEPathMonitor events, and call this
when the nameservers change.

This will also fix the slightly more obscure bug of changing nameservers
 while tailscaled is running.  That change can now be reflected in
magicDNS without having to stop the client.

Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
2025-03-19 09:21:37 -04:00
Irbe Krumina
f50d3b22db
cmd/k8s-operator: configure proxies for HA Ingress to run in cert share mode (#15308)
cmd/k8s-operator: configure HA Ingress replicas to share certs

Creates TLS certs Secret and RBAC that allows HA Ingress replicas
to read/write to the Secret.
Configures HA Ingress replicas to run in read-only mode.

Updates tailscale/corp#24795


Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2025-03-19 12:49:31 +00:00
Tom Proctor
b0095a5da4
cmd/k8s-operator: wait for VIPService before updating HA Ingress status (#15343)
Update the HA Ingress controller to wait until it sees AdvertisedServices
config propagated into at least 1 Pod's prefs before it updates the status
on the Ingress, to ensure the ProxyGroup Pods are ready to serve traffic
before indicating that the Ingress is ready

Updates tailscale/corp#24795

Change-Id: I1b8ce23c9e312d08f9d02e48d70bdebd9e1a4757

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2025-03-19 08:53:15 +00:00
David Anderson
e091e71937 util/eventbus: remove debug UI from iOS build
The use of html/template causes reflect-based linker bloat. Longer
term we have options to bring the UI back to iOS, but for now, cut
it out.

Updates #15297

Signed-off-by: David Anderson <dave@tailscale.com>
2025-03-18 17:04:15 -07:00
David Anderson
daa5635ba6 tsweb: split promvarz into an optional dependency
Allows the use of tsweb without pulling in all of the heavy prometheus
client libraries, protobuf and so on.

Updates #15160

Signed-off-by: David Anderson <dave@tailscale.com>
2025-03-18 16:57:04 -07:00
Anton Tolchanov
74ee749386 client/tailscale: add tailnet lock fields to Device struct
These are documented, but have not yet been defined in the client.
https://tailscale.com/api#tag/devices/GET/device/{deviceId}

Updates tailscale/corp#27050

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2025-03-18 17:03:19 +00:00
Irbe Krumina
34734ba635
ipn/store/kubestore,kube,envknob,cmd/tailscaled/depaware.txt: allow kubestore read/write custom TLS secrets (#15307)
This PR adds some custom logic for reading and writing
kube store values that are TLS certs and keys:
1) when store is initialized, lookup additional
TLS Secrets for this node and if found, load TLS certs
from there
2) if the node runs in certs 'read only' mode and
TLS cert and key are not found in the in-memory store,
look those up in a Secret
3) if the node runs in certs 'read only' mode, run
a daily TLS certs reload to memory to get any
renewed certs

Updates tailscale/corp#24795

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2025-03-18 15:09:22 +00:00
Tom Proctor
ef1e14250c
cmd/k8s-operator: ensure old VIPServices are cleaned up (#15344)
When the Ingress is updated to a new hostname, the controller does not
currently clean up the old VIPService from control. Fix this up to parse
the ownership comment correctly and write a test to enforce the improved
behaviour

Updates tailscale/corp#24795

Change-Id: I792ae7684807d254bf2d3cc7aa54aa04a582d1f5

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2025-03-18 12:48:59 +00:00
Anton Tolchanov
b413b70ae2 cmd/proxy-to-grafana: support setting Grafana role via grants
This adds support for using ACL Grants to configure a role for the
auto-provisioned user.

Fixes tailscale/corp#14567

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2025-03-18 07:26:04 +00:00
License Updater
25b059c0ee licenses: update license notices
Signed-off-by: License Updater <noreply+license-updater@tailscale.com>
2025-03-17 12:50:16 -07:00
James Sanderson
27ef9b666c ipn/ipnlocal: add test for CapMap packet filters
Updates tailscale/corp#20514

Signed-off-by: James Sanderson <jsanderson@tailscale.com>
2025-03-17 11:24:54 +00:00
Andrew Lytvynov
3a4b622276
.github/workflows/govulncheck.yml: send messages to another channel (#15295)
Updates #cleanup

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
2025-03-14 12:30:29 -07:00
Irbe Krumina
299c5372bd
cmd/containerboot: manage HA Ingress TLS certs from containerboot (#15303)
cmd/containerboot: manage HA Ingress TLS certs from containerboot

When ran as HA Ingress node, containerboot now can determine
whether it should manage TLS certs for the HA Ingress replicas
and call the LocalAPI cert endpoint to ensure initial issuance
and renewal of the shared TLS certs.

Updates tailscale/corp#24795

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2025-03-14 17:33:08 +00:00
Jordan Whited
8b1e7f646e
net/packet: implement Geneve header serialization (#15301)
Updates tailscale/corp#27100

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2025-03-13 13:33:26 -07:00
Patrick O'Doherty
f0b395d851
go.mod update golang.org/x/net to 0.36.0 for govulncheck (#15296)
Updates #cleanup

Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
2025-03-13 10:37:42 -07:00
M. J. Fromberger
0663412559
util/eventbus: add basic throughput benchmarks (#15284)
Shovel small events through the pipeine as fast as possible in a few basic
configurations, to establish some baseline performance numbers.

Updates #15160

Change-Id: I1dcbbd1109abb7b93aa4dcb70da57f183eb0e60e
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2025-03-13 08:06:20 -07:00
Paul Scott
eb680edbce
cmd/testwrapper: print failed tests preventing retry (#15270)
Updates tailscale/corp#26637

Signed-off-by: Paul Scott <paul@tailscale.com>
2025-03-13 14:21:29 +00:00
Irbe Krumina
cd391b37a6
ipn/ipnlocal, envknob: make it possible to configure the cert client to act in read-only mode (#15250)
* ipn/ipnlocal,envknob: add some primitives for HA replica cert share.

Add an envknob for configuring
an instance's cert store as read-only, so that it
does not attempt to issue or renew TLS credentials,
only reads them from its cert store.
This will be used by the Kubernetes Operator's HA Ingress
to enable multiple replicas serving the same HTTPS endpoint
to be able to share the same cert.

Also some minor refactor to allow adding more tests
for cert retrieval logic.


Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2025-03-13 14:14:03 +00:00
Will Norris
45ecc0f85a tsweb: add title to DebugHandler and helper registration methods
Allow customizing the title on the debug index page.  Also add methods
for registering http.HandlerFunc to make it a little easier on callers.

Updates tailscale/corp#27058

Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
2025-03-12 19:21:25 -07:00
David Anderson
6d217d81d1 util/eventbus: add a helper program for bus development
The demo program generates a stream of made up bus events between
a number of bus actors, as a way to generate some interesting activity
to show on the bus debug page.

Signed-off-by: David Anderson <dave@tailscale.com>
2025-03-12 17:47:47 -07:00
David Anderson
d83024a63f util/eventbus: add a debug HTTP handler for the bus
Updates #15160

Signed-off-by: David Anderson <dave@tailscale.com>
2025-03-12 17:47:47 -07:00
Andrew Dunham
640b2fa3ae net/netmon, wgengine/magicsock: be quieter with portmapper logs
This adds a new helper to the netmon package that allows us to
rate-limit log messages, so that they only print once per (major)
LinkChange event. We then use this when constructing the portmapper, so
that we don't keep spamming logs forever on the same network.

Updates #13145

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I6e7162509148abea674f96efd76be9dffb373ae4
2025-03-12 17:45:26 -04:00
Jonathan Nobels
52710945f5
control/controlclient, ipn: add client audit logging (#14950)
updates tailscale/corp#26435

Adds client support for sending audit logs to control via /machine/audit-log.
Specifically implements audit logging for user initiated disconnections.

This will require further work to optimize the peristant storage and exclusion
via build tags for mobile:
tailscale/corp#27011
tailscale/corp#27012

Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
2025-03-12 10:37:03 -04:00
Naman Sood
06ae52d309
words: append to the tail of the wordlists (#15278)
Updates tailscale/corp#14698

Signed-off-by: Naman Sood <mail@nsood.in>
2025-03-11 17:23:21 -04:00
Fran Bull
5ebc135397 tsnet,wgengine: fix src to primary Tailscale IP for TCP dials
Ensure that the src address for a connection is one of the primary
addresses assigned by Tailscale. Not, for example, a virtual IP address.

Updates #14667

Signed-off-by: Fran Bull <fran@tailscale.com>
2025-03-11 13:11:01 -07:00
Patrick O'Doherty
8f0080c7a4
cmd/tsidp: allow CORS requests to openid-configuration (#15229)
Add support for Cross-Origin XHR requests to the openid-configuration
endpoint to enable clients like Grafana's auto-population of OIDC setup
data from its contents.

Updates https://github.com/tailscale/tailscale/issues/10263

Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
2025-03-11 13:10:22 -07:00
dependabot[bot]
03f7f1860e
.github: Bump peter-evans/create-pull-request from 7.0.7 to 7.0.8 (#15257)
Bumps [peter-evans/create-pull-request](https://github.com/peter-evans/create-pull-request) from 7.0.7 to 7.0.8.
- [Release notes](https://github.com/peter-evans/create-pull-request/releases)
- [Commits](dd2324fc52...271a8d0340)

---
updated-dependencies:
- dependency-name: peter-evans/create-pull-request
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-11 11:31:52 -06:00
dependabot[bot]
ce0d8b0fb9
.github: Bump github/codeql-action from 3.28.10 to 3.28.11 (#15258)
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.10 to 3.28.11.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](b56ba49b26...6bb031afdd)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-11 11:25:35 -06:00
Jonathan Nobels
660b0515b9
safesocket, version: fix safesocket_darwin behavior for cmd/tailscale (#15275)
fixes tailscale/tailscale#15269

Fixes the various CLIs for all of the various flavors of tailscaled on
darwin.  The logic in version is updated so that we have methods that
return true only for the actual GUI app (which can beCLI) and the
order of the checks in localTCPPortAndTokenDarwin are corrected so
that the logic works with all 5 combinations of CLI and tailscaled.

Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
2025-03-11 13:24:11 -04:00
Tom Proctor
a6e19f2881
ipn/ipnlocal: allow cache hits for testing ACME certs (#15023)
PR #14771 added support for getting certs from alternate ACME servers, but the
certStore caching mechanism breaks unless you install the CA in system roots,
because we check the validity of the cert before allowing a cache hit, which
includes checking for a valid chain back to a trusted CA. For ease of testing,
allow cert cache hits when the chain is unknown to avoid re-issuing the cert
on every TLS request served. We will still get a cache miss when the cert has
expired, as enforced by a test, and this makes it much easier to test against
non-prod ACME servers compared to having to manage the installation of non-prod
CAs on clients.

Updates #14771

Change-Id: I74fe6593fe399bd135cc822195155e99985ec08a
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2025-03-11 14:09:46 +00:00
Brad Fitzpatrick
e38e5c38cc ssh/tailssh: fix typo in forwardedEnviron method, add docs
And don't return a comma-separated string. That's kinda weird
signature-wise, and not needed by half the callers anyway. The callers
that care can do the join themselves.

Updates #cleanup

Change-Id: Ib5ad51a3c6b663d868eba14fe9dc54b2609cfb0d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-03-10 20:28:36 -07:00
James Tucker
69b27d2fcf cmd/natc: error and log when IP range is exhausted
natc itself can't immediately fix the problem, but it can more correctly
error that return bad addresses.

Updates tailscale/corp#26968

Signed-off-by: James Tucker <james@tailscale.com>
2025-03-10 10:20:22 -07:00
dependabot[bot]
b9f4c5d246
.github: Bump golangci/golangci-lint-action from 6.3.1 to 6.5.0 (#15046)
Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 6.3.1 to 6.5.0.
- [Release notes](https://github.com/golangci/golangci-lint-action/releases)
- [Commits](2e788936b0...2226d7cb06)

---
updated-dependencies:
- dependency-name: golangci/golangci-lint-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Mario Minardi <mario@tailscale.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-09 13:31:02 -06:00