cmd/{containerboot,k8s-operator},kube/kubetypes: unadvertise ingress services on shutdown (#15451)

Ensure no services are advertised as part of shutting down tailscaled.
Prefs are only edited if services are currently advertised, and they're
edited we wait for control's ~15s (+ buffer) delay to failover.

Note that editing prefs will trigger a synchronous write to the state
Secret, so it may fail to persist state if the ProxyGroup is getting
scaled down and therefore has its RBAC deleted at the same time, but that
failure doesn't stop prefs being updated within the local backend,
doesn't  affect connectivity to control, and the state Secret is
about to get deleted anyway, so the only negative side effect is a harmless
error log during shutdown. Control still learns that the node is no
longer advertising the service and triggers the failover.

Note that the first version of this used a PreStop lifecycle hook, but
that only supports GET methods and we need the shutdown to trigger side
effects (updating prefs) so it didn't seem appropriate to expose that
functionality on a GET endpoint that's accessible on the k8s network.

Updates tailscale/corp#24795

Change-Id: I0a9a4fe7a5395ca76135ceead05cbc3ee32b3d3c
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
This commit is contained in:
Tom Proctor
2025-04-09 10:11:15 +01:00
committed by GitHub
parent 8e1aa86bdb
commit dd95a83a65
5 changed files with 75 additions and 17 deletions

View File

@@ -197,6 +197,16 @@ func pgStatefulSet(pg *tsapi.ProxyGroup, namespace, image, tsFirewallMode string
// This mechanism currently (2025-01-26) rely on the local health check being accessible on the Pod's
// IP, so they are not supported for ProxyGroups where users have configured TS_LOCAL_ADDR_PORT to a custom
// value.
//
// NB: For _Ingress_ ProxyGroups, we run shutdown logic within containerboot
// in reaction to a SIGTERM signal instead of using a pre-stop hook. This is
// because Ingress pods need to unadvertise services, and it's preferable to
// avoid triggering those side-effects from a GET request that would be
// accessible to the whole cluster network (in the absence of NetworkPolicy
// rules).
//
// TODO(tomhjp): add a readiness probe or gate to Ingress Pods. There is a
// small window where the Pod is marked ready but routing can still fail.
if pg.Spec.Type == tsapi.ProxyGroupTypeEgress && !hasLocalAddrPortSet(proxyClass) {
c.Lifecycle = &corev1.Lifecycle{
PreStop: &corev1.LifecycleHandler{