ipn/ipnlocal: make tkaSyncIfNeeded exclusive with a mutex

Running corp/ipn#TestNetworkLockE2E has a 1/300 chance of failing, and
deskchecking suggests thats whats happening are two netmaps are racing each
other to be processed through tkaSyncIfNeededLocked. This happens in the
first place because we release b.mu during network RPCs.

To fix this, we make the tka sync logic an exclusive section, so two
netmaps will need to wait for tka sync to complete serially (which is what
we would want anyway, as the second run through probably wont need to
sync).

Signed-off-by: Tom DNetto <tom@tailscale.com>
This commit is contained in:
Tom DNetto
2022-10-06 11:09:13 -07:00
committed by Tom
parent 227777154a
commit a515fc517b
3 changed files with 30 additions and 17 deletions

View File

@@ -198,6 +198,14 @@ type LocalBackend struct {
// dialPlan is any dial plan that we've received from the control
// server during a previous connection; it is cleared on logout.
dialPlan atomic.Pointer[tailcfg.ControlDialPlan]
// tkaSyncLock is used to make tkaSyncIfNeeded an exclusive
// section. This is needed to stop two map-responses in quick succession
// from racing each other through TKA sync logic / RPCs.
//
// tkaSyncLock MUST be taken before mu (or inversely, mu must not be held
// at the moment that tkaSyncLock is taken).
tkaSyncLock sync.Mutex
}
// clientGen is a func that creates a control plane client.
@@ -775,9 +783,12 @@ func (b *LocalBackend) setClientStatus(st controlclient.Status) {
}
}
if st.NetMap != nil {
if err := b.tkaSyncIfNeededLocked(st.NetMap); err != nil {
b.mu.Unlock() // respect locking rules for tkaSyncIfNeeded
if err := b.tkaSyncIfNeeded(st.NetMap); err != nil {
b.logf("[v1] TKA sync error: %v", err)
}
b.mu.Lock()
if !envknob.TKASkipSignatureCheck() {
b.tkaFilterNetmapLocked(st.NetMap)
}