To get the benefit of this optimization requires help from the Go toolchain.
The changes are upstream at https://golang.org/cl/320929,
and have been pulled into the Tailscale fork at
728ecc58fd.
It also requires building with the build tag tailscale_go.
name old time/op new time/op delta
Hash-8 14.0µs ± 0% 13.6µs ± 0% -2.88% (p=0.008 n=5+5)
HashMapAcyclic-8 24.3µs ± 1% 21.2µs ± 1% -12.47% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
Hash-8 2.16kB ± 0% 1.58kB ± 0% -27.01% (p=0.008 n=5+5)
HashMapAcyclic-8 2.53kB ± 0% 0.15kB ± 0% -93.99% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
Hash-8 77.0 ± 0% 49.0 ± 0% -36.36% (p=0.008 n=5+5)
HashMapAcyclic-8 202 ± 0% 4 ± 0% -98.02% (p=0.008 n=5+5)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
setkey
The acyclic map code interacts badly with netaddr.IPs.
One of the netaddr.IP fields is an *intern.Value,
and we use a few sentinel values.
Those sentinel values make many of the netaddr data structures appear cyclic.
One option would be to replace the cycle-detection code with
a Floyd-Warshall style algorithm. The downside is that this will take
longer to detect cycles, particularly if the cycle is long.
This problem is exacerbated by the fact that the acyclic cycle detection
code shares a single visited map for the entire data structure,
not just the subsection of the data structure localized to the map.
Unfortunately, the extra allocations and work (and code) to use per-map
visited maps make this option not viable.
Instead, continue to special-case netaddr data types.
name old time/op new time/op delta
Hash-8 22.4µs ± 0% 14.0µs ± 0% -37.59% (p=0.008 n=5+5)
HashMapAcyclic-8 23.8µs ± 0% 24.3µs ± 1% +1.75% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
Hash-8 2.49kB ± 0% 2.16kB ± 0% ~ (p=0.079 n=4+5)
HashMapAcyclic-8 2.53kB ± 0% 2.53kB ± 0% ~ (all equal)
name old allocs/op new allocs/op delta
Hash-8 86.0 ± 0% 77.0 ± 0% -10.47% (p=0.008 n=5+5)
HashMapAcyclic-8 202 ± 0% 202 ± 0% ~ (all equal)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Hash and xor each entry instead, then write final xor'ed result.
name old time/op new time/op delta
Hash-4 33.6µs ± 4% 34.6µs ± 3% +3.03% (p=0.013 n=10+9)
name old alloc/op new alloc/op delta
Hash-4 1.86kB ± 0% 1.77kB ± 0% -5.10% (p=0.000 n=10+9)
name old allocs/op new allocs/op delta
Hash-4 51.0 ± 0% 49.0 ± 0% -3.92% (p=0.000 n=10+10)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Yes, it printed, but that was an implementation detail for hashing.
And coming optimization will make it print even less.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>