Adds a new .spec.metrics field to ProxyClass to allow users to optionally serve
client metrics (tailscaled --debug) on <Pod-IP>:9001.
Metrics cannot currently be enabled for proxies that egress traffic to tailnet
and for Ingress proxies with tailscale.com/experimental-forward-cluster-traffic-via-ingress annotation
(because they currently forward all cluster traffic to their respective backends).
The assumption is that users will want to have these metrics enabled
continuously to be able to monitor proxy behaviour (as opposed to enabling
them temporarily for debugging). Hence we expose them on Pod IP to make it
easier to consume them i.e via Prometheus PodMonitor.
Updates tailscale/tailscale#11292
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
This adds a health.Tracker to tsd.System, accessible via
a new tsd.System.HealthTracker method.
In the future, that new method will return a tsd.System-specific
HealthTracker, so multiple tsnet.Servers in the same process are
isolated. For now, though, it just always returns the temporary
health.Global value. That permits incremental plumbing over a number
of changes. When the second to last health.Global reference is gone,
then the tsd.System.HealthTracker implementation can return a private
Tracker.
The primary plumbing this does is adding it to LocalBackend and its
dozen and change health calls. A few misc other callers are also
plumbed. Subsequent changes will flesh out other parts of the tree
(magicsock, controlclient, etc).
Updates #11874
Updates #4136
Change-Id: Id51e73cfc8a39110425b6dc19d18b3975eac75ce
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In prep for tsd.System Tracker plumbing throughout tailscaled,
defensively permit all methods on Tracker to accept a nil receiver
without crashing, lest I screw something up later. (A health tracking
system that itself causes crashes would be no good.) Methods on nil
receivers should not be called, so a future change will also collect
their stacks (and panic during dev/test), but we should at least not
crash in prod.
This also locks that in with a test using reflect to automatically
call all methods on a nil receiver and check they don't crash.
Updates #11874
Updates #4136
Change-Id: I8e955046ebf370ec8af0c1fb63e5123e6282a9d3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Previously it was both metadata about the class of warnable item as
well as the value.
Now it's only metadata and the value is per-Tracker.
Updates #11874
Updates #4136
Change-Id: Ia1ed1b6c95d34bc5aae36cffdb04279e6ba77015
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This moves most of the health package global variables to a new
`health.Tracker` type.
But then rather than plumbing the Tracker in tsd.System everywhere,
this only goes halfway and makes one new global Tracker
(`health.Global`) that all the existing callers now use.
A future change will eliminate that global.
Updates #11874
Updates #4136
Change-Id: I6ee27e0b2e35f68cb38fecdb3b2dc4c3f2e09d68
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This improves convenience and security.
* Convenience - no need to see nodes that can't share anything with you.
* Security - malicious nodes can't expose shares to peers that aren't
allowed to access their shares.
Updates tailscale/corp#19432
Signed-off-by: Percy Wegmann <percy@tailscale.com>
Allows all users to read all files, and .sh/.cgi files to be
executable.
Updates tailscale/tailscale-qpkg#135
Signed-off-by: Sonia Appasamy <sonia@tailscale.com>
If seamless key renewal is enabled, we typically do not stop the engine
(deconfigure networking). However, if the node key has expired there is
no point in keeping the connection up, and it might actually prevent
key renewal if auth relies on endpoints routed via app connectors.
Fixestailscale/corp#5800
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
Fixestailscale/corp#19459
This PR adds the ability for users of the syspolicy handler to read string arrays from the MDM solution configured on the system.
Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
This change allows for the release/dist/qnap package to be used
outside of the tailscale repo (notably, will be used from corp),
by using an embedded file system for build files which gets
temporarily written to a new folder during qnap build runs.
Without this change, when used from corp, the release/dist/qnap
folder will fail to be found within the corp repo, causing
various steps of the build to fail.
The file renames in this change are to combine the build files
into a /files folder, separated into /scripts and /Tailscale.
Updates tailscale/tailscale-qpkg#135
Signed-off-by: Sonia Appasamy <sonia@tailscale.com>
This helps reduce memory pressure on tailnets with large numbers
of routes.
Updates tailscale/corp#19332
Signed-off-by: Percy Wegmann <percy@tailscale.com>
This PR bumps iptables to a newer version that has a function to detect
'NotExists' errors and uses that function to determine whether errors
received on iptables rule and chain clean up are because the rule/chain
does not exist- if so don't log the error.
Updates corp#19336
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
* cmd/containerboot,util/linuxfw: support proxy backends specified by DNS name
Adds support for optionally configuring containerboot to proxy
traffic to backends configured by passing TS_EXPERIMENTAL_DEST_DNS_NAME env var
to containerboot.
Containerboot will periodically (every 10 minutes) attempt to resolve
the DNS name and ensure that all traffic sent to the node's
tailnet IP gets forwarded to the resolved backend IP addresses.
Currently:
- if the firewall mode is iptables, traffic will be load balanced
accross the backend IP addresses using round robin. There are
no health checks for whether the IPs are reachable.
- if the firewall mode is nftables traffic will only be forwarded
to the first IP address in the list. This is to be improved.
* cmd/k8s-operator: support ExternalName Services
Adds support for exposing endpoints, accessible from within
a cluster to the tailnet via DNS names using ExternalName Services.
This can be done by annotating the ExternalName Service with
tailscale.com/expose: "true" annotation.
The operator will deploy a proxy configured to route tailnet
traffic to the backend IPs that service.spec.externalName
resolves to. The backend IPs must be reachable from the operator's
namespace.
Updates tailscale/tailscale#10606
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Since the tailscaled binaries that we distribute are static and don't
link cgo, we previously wouldn't fetch group IDs that are returned via
NSS. Try shelling out to the 'id' command, similar to how we call
'getent', to detect such cases.
Updates #11682
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I9bdc938bd76c71bc130d44a97cc2233064d64799
There is an undocumented 16KiB limit for text log messages.
However, the limit for JSON messages is 256KiB.
Even worse, logging JSON as text results in significant overhead
since each double quote needs to be escaped.
Instead, use logger.Logf.JSON to explicitly log the info as JSON.
We also modify osdiag to return the information as structured data
rather than implicitly have the package log on our behalf.
This gives more control to the caller on how to log.
Updates #7802
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Prior to
1613b18f82 (diff-314ba0d799f70c8998940903efb541e511f352b39a9eeeae8d475c921d66c2ac),
nodes could set AutoUpdate.Apply=true on unsupported platforms via
`EditPrefs`. Specifically, this affects tailnets where default
auto-updates are on.
Fix up those invalid prefs on profile reload, as a migration.
Updates #11544
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
Creates new QNAP builder target, which builds go binaries then uses
docker to build into QNAP packages. Much of the docker/script code
here is pulled over from https://github.com/tailscale/tailscale-qpkg,
with adaptation into our builder structures.
The qnap/Tailscale folder contains static resources needed to build
Tailscale qpkg packages, and is an exact copy of the existing folder
in the tailscale-qpkg repo.
Builds can be run with:
```
sudo ./tool/go run ./cmd/dist build qnap
```
Updates tailscale/tailscale-qpkg#135
Signed-off-by: Sonia Appasamy <sonia@tailscale.com>
Since we already track active SSH connections, it's not hard to
proactively reject updates until those finish. We attempt to do the same
on the control side, but the detection latency for new connections is in
the minutes, which is not fast enough for common short sessions.
Handle a `force=true` query parameter to override this behavior, so that
control can still trigger an update on a server where some long-running
abandoned SSH session is open.
Updates https://github.com/tailscale/corp/issues/18556
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
It was only obviously unused after the previous change, c39cde79d.
Updates #19334
Change-Id: I9896d5fa692cb4346c070b4a339d0d12340c18f7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We were storing server-side lots of:
"Auth":{"Provider":"","LoginName":"","Oauth2Token":null,"AuthKey":""},
That was about 7% of our total storage of pending RegisterRequest
bodies.
Updates tailscale/corp#19327
Change-Id: Ib73842759a2b303ff5fe4c052a76baea0d68ae7d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
If this happens, it results in us pessimistically closing more
connections than might be necessary, but is more correct since we won't
"miss" a change to the default route interface and keep trying to send
data over a nonexistent interface, or one that can't reach the internet.
Updates tailscale/corp#19124
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ia0b8b04cb8cdcb0da0155fd08751c9dccba62c1a
The Network Location Awareness service identifies networks authenticated against
an Active Directory domain and categorizes them as "Domain Authenticated".
This includes the Tailscale network if a Domain Controller is reachable through it.
If a network is categories as NLM_NETWORK_CATEGORY_DOMAIN_AUTHENTICATED,
it is not possible to override its category, and we shouldn't attempt to do so.
Additionally, our Windows Firewall rules should be compatible with both private
and domain networks.
This fixes both issues.
Fixes#11813
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Containers are typically immutable and should be updated as a whole (and
not individual packages within). Deny enablement of auto-updates in
containers.
Also, add the missing check in EditPrefs in LocalAPI, to catch cases
like tailnet default auto-updates getting enabled for nodes that don't
support it.
Updates #11544
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
We don't always have the same latest version for all platforms (like
with 1.64.2 is only Synology+Windows), so we should use the OS-specific
result from pkgs JSON response instead of the main Version field.
Updates #11795
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
Kubernetes cluster domain defaults to 'cluster.local', but can also be customized.
We need to determine cluster domain to set up in-cluster forwarding to our egress proxies.
This was previously hardcoded to 'cluster.local', so was the egress proxies were not usable in clusters with custom domains.
This PR ensures that we attempt to determine the cluster domain by parsing /etc/resolv.conf.
In case the cluster domain cannot be determined from /etc/resolv.conf, we fall back to 'cluster.local'.
Updates tailscale/tailscale#10399,tailscale/tailscale#11445
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
peerapi does not want these, but rclone includes them.
Removing them allows rclone to work with Taildrive configured
as a WebDAV remote.
Updates #cleanup
Signed-off-by: Percy Wegmann <percy@tailscale.com>
peerapi does not want these, but rclone includes them.
Stripping them out allows rclone to work with Taildrive configured
as a WebDAV remote.
Updates #cleanup
Signed-off-by: Percy Wegmann <percy@tailscale.com>
This ensures that MOVE, LOCK and any other verbs that use the Location
header work correctly.
Fixes#11758
Signed-off-by: Percy Wegmann <percy@tailscale.com>
-Move Android impl into interfaces_android.go
-Instead of using ip route to get the interface name, use the one passed in by Android (ip route is restricted in Android 13+ per termux/termux-app#2993)
Follow-up will be to do the same for router
Fixestailscale/corp#19215Fixestailscale/corp#19124
Signed-off-by: kari-ts <kari@tailscale.com>
Some editions of Windows server share the same build number as their
client counterpart; we must use an additional field found in the OS
version information to distinguish between them.
Even though "Distro" has Linux connotations, it is the most appropriate
hostinfo field. What is Windows Server if not an alternate distribution
of Windows? This PR populates Distro with "Server" when applicable.
Fixes#11785
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
This change is safe (self is still safe, by
definition), and makes the code match the comment.
Updates #cleanup
Signed-off-by: Chris Palmer <cpalmer@tailscale.com>
Most of the magicsock tests fake the network, simulating packets going
out and coming in. There's no reason to actually hit your router to do
UPnP/NAT-PMP/PCP during in tests. But while debugging thousands of
iterations of tests to deflake some things, I saw it slamming my
router. This stops that.
Updates #11762
Change-Id: I59b9f48f8f5aff1fa16b4935753d786342e87744
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Turns out, profileManager is not safe for concurrent use and I missed
all the locking infrastructure in LocalBackend, oops.
I was not able to reproduce the race even with `go test -count 100`, but
this seems like an obvious fix.
Fixes#11773
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>