cmd/containerboot,kube/ingressservices: proxy VIPService TCP/UDP traffic to cluster Services (#15897)

cmd/containerboot,kube/ingressservices: proxy VIPService TCP/UDP traffic to cluster Services

This PR is part of the work to implement HA for Kubernetes Operator's
network layer proxy.
Adds logic to containerboot to monitor mounted ingress firewall configuration rules
and update iptables/nftables rules as the config changes.
Also adds new shared types for the ingress configuration.
The implementation is intentionally similar to that for HA for egress proxy.

Updates tailscale/tailscale#15895

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
This commit is contained in:
Irbe Krumina
2025-05-19 10:42:03 +01:00
committed by GitHub
parent 469fabd8de
commit 6b97e615d6
10 changed files with 1456 additions and 804 deletions

View File

@@ -9,7 +9,6 @@ import (
"bytes"
"context"
"encoding/json"
"fmt"
"log"
"os"
"path/filepath"
@@ -170,46 +169,3 @@ func readServeConfig(path, certDomain string) (*ipn.ServeConfig, error) {
}
return &sc, nil
}
func ensureServicesNotAdvertised(ctx context.Context, lc *local.Client) error {
prefs, err := lc.GetPrefs(ctx)
if err != nil {
return fmt.Errorf("error getting prefs: %w", err)
}
if len(prefs.AdvertiseServices) == 0 {
return nil
}
log.Printf("serve proxy: unadvertising services: %v", prefs.AdvertiseServices)
if _, err := lc.EditPrefs(ctx, &ipn.MaskedPrefs{
AdvertiseServicesSet: true,
Prefs: ipn.Prefs{
AdvertiseServices: nil,
},
}); err != nil {
// EditPrefs only returns an error if it fails _set_ its local prefs.
// If it fails to _persist_ the prefs in state, we don't get an error
// and we continue waiting below, as control will failover as usual.
return fmt.Errorf("error setting prefs AdvertiseServices: %w", err)
}
// Services use the same (failover XOR regional routing) mechanism that
// HA subnet routers use. Unfortunately we don't yet get a reliable signal
// from control that it's responded to our unadvertisement, so the best we
// can do is wait for 20 seconds, where 15s is the approximate maximum time
// it should take for control to choose a new primary, and 5s is for buffer.
//
// Note: There is no guarantee that clients have been _informed_ of the new
// primary no matter how long we wait. We would need a mechanism to await
// netmap updates for peers to know for sure.
//
// See https://tailscale.com/kb/1115/high-availability for more details.
// TODO(tomhjp): Wait for a netmap update instead of sleeping when control
// supports that.
select {
case <-ctx.Done():
return nil
case <-time.After(20 * time.Second):
return nil
}
}