There was a logical race where Conn.Rebind could acquire the
RebindingUDPConn mutex, close the connection, fail to rebind, release
the mutex, and then because the mutex was no longer held, ReceiveIPv4
wouldn't retry reads that failed with net.ErrClosed, letting that
error back to wireguard-go, which would then stop running that receive
IP goroutine.
Instead, keep the RebindingUDPConn mutex held for the entirety of the
replacement in all cases.
Updates tailscale/corp#1289
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Also change the type to netaddr.IP while here, because it made sorting
easier.
Updates tailscale/corp#1397
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
UIs need to see the full unedited netmap in order to know what exit nodes they
can offer to the user.
Signed-off-by: David Anderson <danderson@tailscale.com>
* move probing out of netcheck into new net/portmapper package
* use PCP ANNOUNCE op codes for PCP discovery, rather than causing
short-lived (sub-second) side effects with a 1-second-expiring map +
delete.
* track when we heard things from the router so we can be less wasteful
in querying the router's port mapping services in the future
* use portmapper from magicsock to map a public port
Fixes#1298Fixes#1080Fixes#1001
Updates #864
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When a handshake race occurs, a queued data packet can get lost.
TestTwoDevicePing expected that the very first data packet would arrive.
This caused occasional flakes.
Change TestTwoDevicePing to repeatedly re-send packets
and succeed when one of them makes it through.
This is acceptable (vs making WireGuard not drop the packets)
because this only affects communication with extremely old clients.
And those extremely old clients will eventually connect,
because the kernel will retry sends on timeout.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
We modified the standard net package to not allocate a *net.UDPAddr
during a call to (*net.UDPConn).ReadFromUDP if the caller's use
of the *net.UDPAddr does not cause it to escape.
That is https://golang.org/cl/291390.
This is the companion change to magicsock.
There are two changes required.
First, call ReadFromUDP instead of ReadFrom, if possible.
ReadFrom returns a net.Addr, which is an interface, which always allocates.
Second, reduce the lifetime of the returned *net.UDPAddr.
We do this by immediately converting it into a netaddr.IPPort.
We left the existing RebindingUDPConn.ReadFrom method in place,
as it is required to satisfy the net.PacketConn interface.
With the upstream change and both of these fixes in place,
we have removed one large allocation per packet received.
name old time/op new time/op delta
ReceiveFrom-8 16.7µs ± 5% 16.4µs ± 8% ~ (p=0.310 n=5+5)
name old alloc/op new alloc/op delta
ReceiveFrom-8 112B ± 0% 64B ± 0% -42.86% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
ReceiveFrom-8 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.008 n=5+5)
Co-authored-by: Sonia Appasamy <sonia@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
addrSet maintained duplicate lists of netaddr.IPPorts and net.UDPAddrs.
Unify to use the netaddr type only.
This makes (*Conn).ReceiveIPvN a bit uglier,
but that'll be cleaned up in a subsequent commit.
This is preparatory work to remove an allocation from ReceiveIPv4.
Co-authored-by: Sonia Appasamy <sonia@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
I based my estimation of the required timeout based on locally
observed behavior. But CI machines are worse than my local machine.
16s was enough to reduce flakiness but not eliminate it. Bump it up again.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
We removed the "fast retry" code from our wireguard-go fork.
As a result, pings can take longer to transit when retries are required.
Allow that.
Fixes#1277
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The fix can make this test run unconditionally.
This moves code from 5c619882bc for
testability but doesn't fix it yet. The #1282 problem remains (when I
wrote its wake-up mechanism, I forgot there were N DERP readers
funneling into 1 UDP reader, and the code just isn't correct at all
for that case).
Also factor out some test helper code from BenchmarkReceiveFrom.
The refactoring in magicsock.go for testability should have no
behavior change.
Magicsock started dropping all traffic internally when Tailscale is
shut down, to avoid spurious wireguard logspam. This made the benchmark
not receive anything. Setting a dummy private key is sufficient to get
magicsock to pass traffic for benchmarking purposes.
Fixes#1270.
Signed-off-by: David Anderson <danderson@tailscale.com>
Use tb.Cleanup to simplify both the API and the implementation.
One behavior change: When the number of goroutines shrinks, don't log.
I've never found these logs to be useful, and they frequently add noise.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
This is mostly code movement from the wireguard-go repo.
Most of the new wgcfg package corresponds to the wireguard-go wgcfg package.
wgengine/wgcfg/device{_test}.go was device/config{_test}.go.
There were substantive but simple changes to device_test.go to remove
internal package device references.
The API of device.Config (now wgcfg.DeviceConfig) grew an error return;
we previously logged the error and threw it away.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Rewrite log lines on the fly, based on the set of known peers.
This enables us to use upstream wireguard-go logging,
but maintain the Tailscale-style peer public key identifiers
that the rest of our systems (and people) expect.
Fixes#1183
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Also, don't try to use IPv6 LinkLocalUnicast addresses for now. Like endpoints
exchanged with control, we share them but don't yet use them.
Updates #1172
c8c493f3d9 made it always say
`created=false` which scared me when I saw it, as that would've implied
things were broken much worse. Fortunately the logging was just wrong.
DstToString is used in two places in wireguard-go: Logging and uapi.
We are switching to use uapi for wireguard-go config.
To preserve existing behavior, we need the full set of addrs.
And for logging, having the full set of addrs seems useful.
(The Addrs method itself is slated for removal. When that happens,
the implementation will move to DstToString.)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
To save CPU and wakeups, don't run the DERP cleanup timer regularly
unless there is a non-home DERP connection open.
Also eliminates the goroutine, moving to a time.AfterFunc.
Updates #1034
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This reverts commit 08baa17d9a.
It caused deadlocks due to lock ordering violations.
It was not the right fix, and thus should simply be reverted
while we look for the right fix (if we haven't already found it
in the interim; we've fixed other logging-after-test issues).
Fixes#1161
context.cancelCtx.Done involves a mutex and isn't as cheap as I
previously assumed. Convert the donec method into a struct field and
store the channel value once. Our one magicsock.Conn gets one pointer
larger, but it cuts ~1% of the CPU time of the ReceiveFrom benchmark
and removes a bubble from the --svg output :)
22507adf54 stopped relying on
our fork of wireguard-go's UpdateDst callback.
As a result, we can unwind that code,
and the extra return value of ReceiveIPv{4,6}.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
TwoDevicePing is explicitly testing the behavior of the legacy codepath, everything
else is happy to assume that code no longer exists.
Signed-off-by: David Anderson <danderson@tailscale.com>
Previously, this benchmark relied on behavior of the legacy
receive codepath, which I changed in 22507adf. With this
change, the benchmark instead relies on the new active discovery
path.
Signed-off-by: David Anderson <danderson@tailscale.com>