I noticed portlist when looking at some profiles and hadn't looked at
the code much before. This is a first pass over it. It allocates a
fair bit. More love remains, but this does a bit:
name old time/op new time/op delta
GetList-8 9.92ms ± 8% 9.64ms ±12% ~ (p=0.247 n=10+10)
name old alloc/op new alloc/op delta
GetList-8 931kB ± 0% 869kB ± 0% -6.70% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
GetList-8 4.59k ± 0% 3.69k ± 1% -19.71% (p=0.000 n=10+10)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Before, endpoint updates were constantly being interrupted and resumed
on Linux due to tons of LinkChange messages from over-zealous Linux
netlink messages (from router_linux.go)
Now that endpoint updates are fast and bounded in time anyway, just
let them run to completion, but note that another needs to be
scheduled after.
Now logs went from pages of noise to just:
root@taildoc:~# grep -i -E 'stun|endpoint update' log
2020/03/13 08:51:29 magicsock.Conn: starting endpoint update (initial)
2020/03/13 08:51:30 magicsock.Conn.ReSTUN: endpoint update active, need another later ("link-change-minor")
2020/03/13 08:51:31 magicsock.Conn: starting endpoint update (link-change-minor)
2020/03/13 08:51:31 magicsock.Conn.ReSTUN: endpoint update active, need another later ("link-change-minor")
2020/03/13 08:51:33 magicsock.Conn: starting endpoint update (link-change-minor)
2020/03/13 08:51:33 magicsock.Conn.ReSTUN: endpoint update active, need another later ("link-change-minor")
2020/03/13 08:51:35 magicsock.Conn: starting endpoint update (link-change-minor)
2020/03/13 08:51:35 magicsock.Conn.ReSTUN: endpoint update active, need another later ("link-change-minor")
Or, seen in another run:
2020/03/13 08:45:41 magicsock.Conn: starting endpoint update (periodic)
2020/03/13 08:46:09 magicsock.Conn: starting endpoint update (periodic)
2020/03/13 08:46:21 magicsock.Conn: starting endpoint update (link-change-major)
2020/03/13 08:46:37 magicsock.Conn: starting endpoint update (periodic)
2020/03/13 08:47:05 magicsock.Conn: starting endpoint update (periodic)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This removes the need for go-cmp, which is extremely bloaty so we had
to leave it out of iOS. As a result, we had also left it out of macOS,
and so we didn't print netmap diffs at all on darwin-based platforms.
Oops.
As a bonus, the output format of the new function is way better.
Minor oddity: because I used the dumbest possible diff algorithm, the
sort order is a bit dumb. We print all "removed" lines and then print
all "added" lines, rather than doing the usual diff-like thing of
interspersing them. This probably doesn't matter (maybe it's an
improvement).
The .Concise() view had grown hard to read over time. Originally, we
assumed a peer almost always had just one endpoint and one-or-more
allowedips. With magicsock, we now almost always have multiple
endpoints per peer. And empirically, almost every peer has only one
allowedip.
Change their order so we can line up allowedips vertically. Also do
some tweaking to make multiple endpoints easier to read.
While we're here, add a column to show the home DERP server of each
peer, if any.
We log it once upon receiving the first copy of the map, then
subsequently when a new one appears, but only if we haven't logged one
less than 5 minutes ago.
This avoids overly cluttering the log (as we did before, logging the
netmap every time one appeared, which could be hundreds of lines every
few seconds), but still gives the log enough context to help in
diagnosing problems retroactively.
Without the recent write deadline introduction, this test fails.
They still do interfere, but the interference is now bound by
the write deadline. Many improvements are possible.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
If Alice attempts to send a packet to Bob and the DERP server
encounters an error on the socket to Bob, we should not disconnect
Alice for that.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
This is a lot like wiring up a local UDP socket, read and write
deadlines work. The big difference is the Block feature, which
lets you stop the packet flow without breaking the connection.
This lets you emulate broken sockets and test timeouts actually
work.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
* 'master' of github.com:tailscale/tailscale:
netcheck: fix data races for staggler STUN packets arriving after GetReport
wgengine/magicsock: add a pointer value for logging
netcheck: ignore IPv4 STUN failures if we saw at least one reply
netcheck: ignore IPv6 STUN failures
derp: add clients_replaced counter
version: bump OSS version datestamp.
The TODO above derphttp.NewClient suggests it does network I/O,
but the derphttp client connects lazily and so creating one is
very cheap.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
It used to make assumptions based on having Anycast IPs that are super
near. Now we're intentionally going to a bunch of different distant
IPs to measure latency.
Also, optimize how the hairpin detection works. No need to STUN on
that socket. Just use that separate socket for sending, once we know
the other UDP4 socket's endpoint. The trick is: make our test probe
also a STUN packet, so it fits through magicsock's existing STUN
routing.
This drops netcheck from ~5 seconds to ~250-500ms.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Failure to do this leads to fd exhaustion at -count=10000,
and increasingly poor execution north of -count=100.
Signed-off-by: David Anderson <danderson@tailscale.com>
Failure to do so triggers either a data race or a panic
in the testing package, due to racey use of t.Logf.
Signed-off-by: David Anderson <danderson@tailscale.com>
Basically, don't trust the OS-level link monitor to only tell you
interesting things. Sanity check it.
Also, move the interfaces package into the net directory now that we
have it.