mirror of
https://github.com/tailscale/tailscale.git
synced 2024-11-29 04:55:31 +00:00
derp: track client-advertised non-ideal DERP connections in more places
Some checks are pending
checklocks / checklocks (push) Waiting to run
CodeQL / Analyze (go) (push) Waiting to run
Dockerfile build / deploy (push) Waiting to run
CI / race-root-integration (1/4) (push) Waiting to run
CI / race-root-integration (2/4) (push) Waiting to run
CI / race-root-integration (3/4) (push) Waiting to run
CI / race-root-integration (4/4) (push) Waiting to run
CI / test (-coverprofile=/tmp/coverage.out, amd64) (push) Waiting to run
CI / test (-race, amd64, 1/3) (push) Waiting to run
CI / test (-race, amd64, 2/3) (push) Waiting to run
CI / test (-race, amd64, 3/3) (push) Waiting to run
CI / test (386) (push) Waiting to run
CI / windows (push) Waiting to run
CI / privileged (push) Waiting to run
CI / vm (push) Waiting to run
CI / race-build (push) Waiting to run
CI / cross (386, linux) (push) Waiting to run
CI / cross (amd64, darwin) (push) Waiting to run
CI / cross (amd64, freebsd) (push) Waiting to run
CI / fuzz (push) Waiting to run
CI / cross (amd64, openbsd) (push) Waiting to run
CI / cross (amd64, windows) (push) Waiting to run
CI / cross (arm, 5, linux) (push) Waiting to run
CI / cross (arm, 7, linux) (push) Waiting to run
CI / cross (arm64, darwin) (push) Waiting to run
CI / cross (arm64, linux) (push) Waiting to run
CI / cross (arm64, windows) (push) Waiting to run
CI / cross (loong64, linux) (push) Waiting to run
CI / ios (push) Waiting to run
CI / depaware (push) Waiting to run
CI / crossmin (amd64, plan9) (push) Waiting to run
CI / crossmin (ppc64, aix) (push) Waiting to run
CI / android (push) Waiting to run
CI / wasm (push) Waiting to run
CI / tailscale_go (push) Waiting to run
CI / go_generate (push) Waiting to run
CI / go_mod_tidy (push) Waiting to run
CI / licenses (push) Waiting to run
CI / staticcheck (386, windows) (push) Waiting to run
CI / staticcheck (amd64, darwin) (push) Waiting to run
CI / staticcheck (amd64, linux) (push) Waiting to run
CI / staticcheck (amd64, windows) (push) Waiting to run
CI / notify_slack (push) Blocked by required conditions
CI / check_mergeability (push) Blocked by required conditions
Some checks are pending
checklocks / checklocks (push) Waiting to run
CodeQL / Analyze (go) (push) Waiting to run
Dockerfile build / deploy (push) Waiting to run
CI / race-root-integration (1/4) (push) Waiting to run
CI / race-root-integration (2/4) (push) Waiting to run
CI / race-root-integration (3/4) (push) Waiting to run
CI / race-root-integration (4/4) (push) Waiting to run
CI / test (-coverprofile=/tmp/coverage.out, amd64) (push) Waiting to run
CI / test (-race, amd64, 1/3) (push) Waiting to run
CI / test (-race, amd64, 2/3) (push) Waiting to run
CI / test (-race, amd64, 3/3) (push) Waiting to run
CI / test (386) (push) Waiting to run
CI / windows (push) Waiting to run
CI / privileged (push) Waiting to run
CI / vm (push) Waiting to run
CI / race-build (push) Waiting to run
CI / cross (386, linux) (push) Waiting to run
CI / cross (amd64, darwin) (push) Waiting to run
CI / cross (amd64, freebsd) (push) Waiting to run
CI / fuzz (push) Waiting to run
CI / cross (amd64, openbsd) (push) Waiting to run
CI / cross (amd64, windows) (push) Waiting to run
CI / cross (arm, 5, linux) (push) Waiting to run
CI / cross (arm, 7, linux) (push) Waiting to run
CI / cross (arm64, darwin) (push) Waiting to run
CI / cross (arm64, linux) (push) Waiting to run
CI / cross (arm64, windows) (push) Waiting to run
CI / cross (loong64, linux) (push) Waiting to run
CI / ios (push) Waiting to run
CI / depaware (push) Waiting to run
CI / crossmin (amd64, plan9) (push) Waiting to run
CI / crossmin (ppc64, aix) (push) Waiting to run
CI / android (push) Waiting to run
CI / wasm (push) Waiting to run
CI / tailscale_go (push) Waiting to run
CI / go_generate (push) Waiting to run
CI / go_mod_tidy (push) Waiting to run
CI / licenses (push) Waiting to run
CI / staticcheck (386, windows) (push) Waiting to run
CI / staticcheck (amd64, darwin) (push) Waiting to run
CI / staticcheck (amd64, linux) (push) Waiting to run
CI / staticcheck (amd64, windows) (push) Waiting to run
CI / notify_slack (push) Blocked by required conditions
CI / check_mergeability (push) Blocked by required conditions
In f77821fd63
(released in v1.72.0), we made the client tell a DERP server
when the connection was not its ideal choice (the first node in its region).
But we didn't do anything with that information until now. This adds a
metric about how many such connections are on a given derper, and also
adds a bit to the PeerPresentFlags bitmask so watchers can identify
(and rebalance) them.
Updates tailscale/corp#372
Change-Id: Ief8af448750aa6d598e5939a57c062f4e55962be
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This commit is contained in:
parent
fd77965f23
commit
c76a6e5167
@ -155,7 +155,7 @@ tailscale.com/cmd/tailscale dependencies: (generated by github.com/tailscale/dep
|
|||||||
tailscale.com/util/clientmetric from tailscale.com/net/netcheck+
|
tailscale.com/util/clientmetric from tailscale.com/net/netcheck+
|
||||||
tailscale.com/util/cloudenv from tailscale.com/net/dnscache+
|
tailscale.com/util/cloudenv from tailscale.com/net/dnscache+
|
||||||
tailscale.com/util/cmpver from tailscale.com/net/tshttpproxy+
|
tailscale.com/util/cmpver from tailscale.com/net/tshttpproxy+
|
||||||
tailscale.com/util/ctxkey from tailscale.com/types/logger
|
tailscale.com/util/ctxkey from tailscale.com/types/logger+
|
||||||
💣 tailscale.com/util/deephash from tailscale.com/util/syspolicy/setting
|
💣 tailscale.com/util/deephash from tailscale.com/util/syspolicy/setting
|
||||||
L 💣 tailscale.com/util/dirwalk from tailscale.com/metrics
|
L 💣 tailscale.com/util/dirwalk from tailscale.com/metrics
|
||||||
tailscale.com/util/dnsname from tailscale.com/cmd/tailscale/cli+
|
tailscale.com/util/dnsname from tailscale.com/cmd/tailscale/cli+
|
||||||
|
@ -147,6 +147,7 @@
|
|||||||
PeerPresentIsRegular = 1 << 0
|
PeerPresentIsRegular = 1 << 0
|
||||||
PeerPresentIsMeshPeer = 1 << 1
|
PeerPresentIsMeshPeer = 1 << 1
|
||||||
PeerPresentIsProber = 1 << 2
|
PeerPresentIsProber = 1 << 2
|
||||||
|
PeerPresentNotIdeal = 1 << 3 // client said derp server is not its Region.Nodes[0] ideal node
|
||||||
)
|
)
|
||||||
|
|
||||||
var bin = binary.BigEndian
|
var bin = binary.BigEndian
|
||||||
|
@ -47,6 +47,7 @@
|
|||||||
"tailscale.com/tstime/rate"
|
"tailscale.com/tstime/rate"
|
||||||
"tailscale.com/types/key"
|
"tailscale.com/types/key"
|
||||||
"tailscale.com/types/logger"
|
"tailscale.com/types/logger"
|
||||||
|
"tailscale.com/util/ctxkey"
|
||||||
"tailscale.com/util/mak"
|
"tailscale.com/util/mak"
|
||||||
"tailscale.com/util/set"
|
"tailscale.com/util/set"
|
||||||
"tailscale.com/util/slicesx"
|
"tailscale.com/util/slicesx"
|
||||||
@ -57,6 +58,16 @@
|
|||||||
// verbosely log whenever DERP drops a packet.
|
// verbosely log whenever DERP drops a packet.
|
||||||
var verboseDropKeys = map[key.NodePublic]bool{}
|
var verboseDropKeys = map[key.NodePublic]bool{}
|
||||||
|
|
||||||
|
// IdealNodeHeader is the HTTP request header sent on DERP HTTP client requests
|
||||||
|
// to indicate that they're connecting to their ideal (Region.Nodes[0]) node.
|
||||||
|
// The HTTP header value is the name of the node they wish they were connected
|
||||||
|
// to. This is an optional header.
|
||||||
|
const IdealNodeHeader = "Ideal-Node"
|
||||||
|
|
||||||
|
// IdealNodeContextKey is the context key used to pass the IdealNodeHeader value
|
||||||
|
// from the HTTP handler to the DERP server's Accept method.
|
||||||
|
var IdealNodeContextKey = ctxkey.New[string]("ideal-node", "")
|
||||||
|
|
||||||
func init() {
|
func init() {
|
||||||
keys := envknob.String("TS_DEBUG_VERBOSE_DROPS")
|
keys := envknob.String("TS_DEBUG_VERBOSE_DROPS")
|
||||||
if keys == "" {
|
if keys == "" {
|
||||||
@ -133,6 +144,7 @@ type Server struct {
|
|||||||
sentPong expvar.Int // number of pong frames enqueued to client
|
sentPong expvar.Int // number of pong frames enqueued to client
|
||||||
accepts expvar.Int
|
accepts expvar.Int
|
||||||
curClients expvar.Int
|
curClients expvar.Int
|
||||||
|
curClientsNotIdeal expvar.Int
|
||||||
curHomeClients expvar.Int // ones with preferred
|
curHomeClients expvar.Int // ones with preferred
|
||||||
dupClientKeys expvar.Int // current number of public keys we have 2+ connections for
|
dupClientKeys expvar.Int // current number of public keys we have 2+ connections for
|
||||||
dupClientConns expvar.Int // current number of connections sharing a public key
|
dupClientConns expvar.Int // current number of connections sharing a public key
|
||||||
@ -603,6 +615,9 @@ func (s *Server) registerClient(c *sclient) {
|
|||||||
}
|
}
|
||||||
s.keyOfAddr[c.remoteIPPort] = c.key
|
s.keyOfAddr[c.remoteIPPort] = c.key
|
||||||
s.curClients.Add(1)
|
s.curClients.Add(1)
|
||||||
|
if c.isNotIdealConn {
|
||||||
|
s.curClientsNotIdeal.Add(1)
|
||||||
|
}
|
||||||
s.broadcastPeerStateChangeLocked(c.key, c.remoteIPPort, c.presentFlags(), true)
|
s.broadcastPeerStateChangeLocked(c.key, c.remoteIPPort, c.presentFlags(), true)
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -693,6 +708,9 @@ func (s *Server) unregisterClient(c *sclient) {
|
|||||||
if c.preferred {
|
if c.preferred {
|
||||||
s.curHomeClients.Add(-1)
|
s.curHomeClients.Add(-1)
|
||||||
}
|
}
|
||||||
|
if c.isNotIdealConn {
|
||||||
|
s.curClientsNotIdeal.Add(-1)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// addPeerGoneFromRegionWatcher adds a function to be called when peer is gone
|
// addPeerGoneFromRegionWatcher adds a function to be called when peer is gone
|
||||||
@ -809,8 +827,8 @@ func (s *Server) accept(ctx context.Context, nc Conn, brw *bufio.ReadWriter, rem
|
|||||||
return fmt.Errorf("receive client key: %v", err)
|
return fmt.Errorf("receive client key: %v", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
clientAP, _ := netip.ParseAddrPort(remoteAddr)
|
remoteIPPort, _ := netip.ParseAddrPort(remoteAddr)
|
||||||
if err := s.verifyClient(ctx, clientKey, clientInfo, clientAP.Addr()); err != nil {
|
if err := s.verifyClient(ctx, clientKey, clientInfo, remoteIPPort.Addr()); err != nil {
|
||||||
return fmt.Errorf("client %v rejected: %v", clientKey, err)
|
return fmt.Errorf("client %v rejected: %v", clientKey, err)
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -820,8 +838,6 @@ func (s *Server) accept(ctx context.Context, nc Conn, brw *bufio.ReadWriter, rem
|
|||||||
ctx, cancel := context.WithCancel(ctx)
|
ctx, cancel := context.WithCancel(ctx)
|
||||||
defer cancel()
|
defer cancel()
|
||||||
|
|
||||||
remoteIPPort, _ := netip.ParseAddrPort(remoteAddr)
|
|
||||||
|
|
||||||
c := &sclient{
|
c := &sclient{
|
||||||
connNum: connNum,
|
connNum: connNum,
|
||||||
s: s,
|
s: s,
|
||||||
@ -838,6 +854,7 @@ func (s *Server) accept(ctx context.Context, nc Conn, brw *bufio.ReadWriter, rem
|
|||||||
sendPongCh: make(chan [8]byte, 1),
|
sendPongCh: make(chan [8]byte, 1),
|
||||||
peerGone: make(chan peerGoneMsg),
|
peerGone: make(chan peerGoneMsg),
|
||||||
canMesh: s.isMeshPeer(clientInfo),
|
canMesh: s.isMeshPeer(clientInfo),
|
||||||
|
isNotIdealConn: IdealNodeContextKey.Value(ctx) != "",
|
||||||
peerGoneLim: rate.NewLimiter(rate.Every(time.Second), 3),
|
peerGoneLim: rate.NewLimiter(rate.Every(time.Second), 3),
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -1511,6 +1528,7 @@ type sclient struct {
|
|||||||
peerGone chan peerGoneMsg // write request that a peer is not at this server (not used by mesh peers)
|
peerGone chan peerGoneMsg // write request that a peer is not at this server (not used by mesh peers)
|
||||||
meshUpdate chan struct{} // write request to write peerStateChange
|
meshUpdate chan struct{} // write request to write peerStateChange
|
||||||
canMesh bool // clientInfo had correct mesh token for inter-region routing
|
canMesh bool // clientInfo had correct mesh token for inter-region routing
|
||||||
|
isNotIdealConn bool // client indicated it is not its ideal node in the region
|
||||||
isDup atomic.Bool // whether more than 1 sclient for key is connected
|
isDup atomic.Bool // whether more than 1 sclient for key is connected
|
||||||
isDisabled atomic.Bool // whether sends to this peer are disabled due to active/active dups
|
isDisabled atomic.Bool // whether sends to this peer are disabled due to active/active dups
|
||||||
debug bool // turn on for verbose logging
|
debug bool // turn on for verbose logging
|
||||||
@ -1546,6 +1564,9 @@ func (c *sclient) presentFlags() PeerPresentFlags {
|
|||||||
if c.canMesh {
|
if c.canMesh {
|
||||||
f |= PeerPresentIsMeshPeer
|
f |= PeerPresentIsMeshPeer
|
||||||
}
|
}
|
||||||
|
if c.isNotIdealConn {
|
||||||
|
f |= PeerPresentNotIdeal
|
||||||
|
}
|
||||||
if f == 0 {
|
if f == 0 {
|
||||||
return PeerPresentIsRegular
|
return PeerPresentIsRegular
|
||||||
}
|
}
|
||||||
@ -2051,6 +2072,7 @@ func (s *Server) ExpVar() expvar.Var {
|
|||||||
m.Set("gauge_current_file_descriptors", expvar.Func(func() any { return metrics.CurrentFDs() }))
|
m.Set("gauge_current_file_descriptors", expvar.Func(func() any { return metrics.CurrentFDs() }))
|
||||||
m.Set("gauge_current_connections", &s.curClients)
|
m.Set("gauge_current_connections", &s.curClients)
|
||||||
m.Set("gauge_current_home_connections", &s.curHomeClients)
|
m.Set("gauge_current_home_connections", &s.curHomeClients)
|
||||||
|
m.Set("gauge_current_notideal_connections", &s.curClientsNotIdeal)
|
||||||
m.Set("gauge_clients_total", expvar.Func(func() any { return len(s.clientsMesh) }))
|
m.Set("gauge_clients_total", expvar.Func(func() any { return len(s.clientsMesh) }))
|
||||||
m.Set("gauge_clients_local", expvar.Func(func() any { return len(s.clients) }))
|
m.Set("gauge_clients_local", expvar.Func(func() any { return len(s.clients) }))
|
||||||
m.Set("gauge_clients_remote", expvar.Func(func() any { return len(s.clientsMesh) - len(s.clients) }))
|
m.Set("gauge_clients_remote", expvar.Func(func() any { return len(s.clientsMesh) - len(s.clients) }))
|
||||||
|
@ -498,7 +498,7 @@ func (c *Client) connect(ctx context.Context, caller string) (client *derp.Clien
|
|||||||
req.Header.Set("Connection", "Upgrade")
|
req.Header.Set("Connection", "Upgrade")
|
||||||
if !idealNodeInRegion && reg != nil {
|
if !idealNodeInRegion && reg != nil {
|
||||||
// This is purely informative for now (2024-07-06) for stats:
|
// This is purely informative for now (2024-07-06) for stats:
|
||||||
req.Header.Set("Ideal-Node", reg.Nodes[0].Name)
|
req.Header.Set(derp.IdealNodeHeader, reg.Nodes[0].Name)
|
||||||
// TODO(bradfitz,raggi): start a time.AfterFunc for 30m-1h or so to
|
// TODO(bradfitz,raggi): start a time.AfterFunc for 30m-1h or so to
|
||||||
// dialNode(reg.Nodes[0]) and see if we can even TCP connect to it. If
|
// dialNode(reg.Nodes[0]) and see if we can even TCP connect to it. If
|
||||||
// so, TLS handshake it as well (which is mixed up in this massive
|
// so, TLS handshake it as well (which is mixed up in this massive
|
||||||
|
@ -21,6 +21,8 @@
|
|||||||
// Handler returns an http.Handler to be mounted at /derp, serving s.
|
// Handler returns an http.Handler to be mounted at /derp, serving s.
|
||||||
func Handler(s *derp.Server) http.Handler {
|
func Handler(s *derp.Server) http.Handler {
|
||||||
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||||
|
ctx := r.Context()
|
||||||
|
|
||||||
// These are installed both here and in cmd/derper. The check here
|
// These are installed both here and in cmd/derper. The check here
|
||||||
// catches both cmd/derper run with DERP disabled (STUN only mode) as
|
// catches both cmd/derper run with DERP disabled (STUN only mode) as
|
||||||
// well as DERP being run in tests with derphttp.Handler directly,
|
// well as DERP being run in tests with derphttp.Handler directly,
|
||||||
@ -66,7 +68,11 @@ func Handler(s *derp.Server) http.Handler {
|
|||||||
pubKey.UntypedHexString())
|
pubKey.UntypedHexString())
|
||||||
}
|
}
|
||||||
|
|
||||||
s.Accept(r.Context(), netConn, conn, netConn.RemoteAddr().String())
|
if v := r.Header.Get(derp.IdealNodeHeader); v != "" {
|
||||||
|
ctx = derp.IdealNodeContextKey.WithValue(ctx, v)
|
||||||
|
}
|
||||||
|
|
||||||
|
s.Accept(ctx, netConn, conn, netConn.RemoteAddr().String())
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user