This helps pprof better identify which Go kinds take the most time
since the kind is always in the function name.
There is a minor adjustment where we hash the length of the map
to be more on the cautious side.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Rather than having two copies []fieldInfo,
just maintain one and perform merging in the same pass.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
This helps pprof better identify which Go kinds take the most time
since the kind is always in the function name.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Use of reflect.Value.SetXXX panics if the provided argument was
obtained from an unexported struct field.
Instead, pass an unsafe.Pointer around and convert to a
reflect.Value when necessary (i.e., for maps and interfaces).
Converting from unsafe.Pointer to reflect.Value guarantees that
none of the read-only bits will be populated.
When running in race mode, we attach type information to the pointer
so that we can type check every pointer operation.
This also type-checks that direct memory hashing is within
the valid range of a struct value.
We add test cases that previously caused deephash to panic,
but now pass.
Performance:
name old time/op new time/op delta
Hash 14.1µs ± 1% 14.1µs ± 1% ~ (p=0.590 n=10+9)
HashPacketFilter 2.53µs ± 2% 2.44µs ± 1% -3.79% (p=0.000 n=9+10)
TailcfgNode 1.45µs ± 1% 1.43µs ± 0% -1.36% (p=0.000 n=9+9)
HashArray 318ns ± 2% 318ns ± 2% ~ (p=0.541 n=10+10)
HashMapAcyclic 32.9µs ± 1% 31.6µs ± 1% -4.16% (p=0.000 n=10+9)
There is a slight performance gain due to the use of unsafe.Pointer
over reflect.Value methods. Also, passing an unsafe.Pointer (1 word)
on the stack is cheaper than passing a reflect.Value (3 words).
Performance gains are diminishing since SHA-256 hashing now dominates the runtime.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
When built with "deephash_debug", print the set of HashXXX methods.
Example usage:
$ go test -run=GetTypeHasher/string_slice -tags=deephash_debug
U64(2)+U64(3)+S("foo")+U64(3)+S("bar")+FIN
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Rather than separate functions to hash each kind,
just rely on the fact that these are direct memory hashable,
thus simplifying the code.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Every implementation of typeHasherFunc always returns true,
which implies that the slow path is no longer executed.
Delete it.
h.hashValueWithType(v, ti, ...) is deleted as it is equivalent to:
ti.hasher()(h, v)
h.hashValue(v, ...) is deleted as it is equivalent to:
ti := getTypeInfo(v.Type())
ti.hasher()(h, v)
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Add support for maps and interfaces to the fast path.
Add cycle-detection to the pointer handling logic.
This logic is mostly copied from the slow path.
A future commit will delete the slow path once
the fast path never falls back to the slow path.
Performance:
name old time/op new time/op delta
Hash-24 18.5µs ± 1% 14.9µs ± 2% -19.52% (p=0.000 n=10+10)
HashPacketFilter-24 2.54µs ± 1% 2.60µs ± 1% +2.19% (p=0.000 n=10+10)
HashMapAcyclic-24 31.6µs ± 1% 30.5µs ± 1% -3.42% (p=0.000 n=9+8)
TailcfgNode-24 1.44µs ± 2% 1.43µs ± 1% ~ (p=0.171 n=10+10)
HashArray-24 324ns ± 1% 324ns ± 2% ~ (p=0.425 n=9+9)
The additional cycle detection logic doesn't incur much slow down
since it only activates if a type is recursive, which does not apply
for any of the types that we care about.
There is a notable performance boost since we switch from the fath path
to the slow path less often. Most notably, a struct with a field that
could not be handled by the fast path would previously cause
the entire struct to go through the slow path.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
There are 5 types that we care about that implement AppendTo:
key.DiscoPublic
key.NodePublic
netip.Prefix
netipx.IPRange
netip.Addr
The key types are thin wrappers around [32]byte and are memory hashable.
The netip.Prefix and netipx.IPRange types are thin wrappers over netip.Addr
and are hashable by default if netip.Addr is hashable.
The netip.Addr type is the only one with a complex structure where
the default behavior of deephash does not hash it correctly due to the presence
of the intern.Value type.
Drop support for AppendTo and instead add specialized hashing for netip.Addr
that would be semantically equivalent to == on the netip.Addr values.
The AppendTo support was already broken prior to this change.
It was fully removed (intentionally or not) in #4870.
It was partially restored in #4858 for the fast path,
but still broken in the slow path.
Just drop support for it altogether.
This does mean we lack any ability for types to self-hash themselves.
In the future we can add support for types that implement:
interface { DeepHash() Sum }
Test and fuzz cases were added for the relevant types that
used to rely on the AppendTo method.
FuzzAddr has been executed on 1 billion samples without issues.
Signed-off-by: Joe Tsai joetsai@digital-static.net
Rename Hash as Block512 to indicate that this is a general-purpose
hash.Hash for any algorithm that operates on 512-bit block sizes.
While we rename the package as hashx in this commit,
a subsequent commit will move the sha256x package to hashx.
This is done separately to avoid confusing git.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Also, rename canMemHash to typeIsMemHashable to be consistent.
There are zero changes to the semantics.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Any type that is memory hashable must not be recursive since
there are definitely no pointers involved to make a cycle.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Put the t.Size() == 0 check first since this is applicable in all cases.
Drop the last struct field conditional since this is covered by the
sumFieldSize check at the end.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Hashing []any is slow since hashing of interfaces is slow.
Hashing of interfaces is slow since we pessimistically assume
that cycles can occur through them and start cycle tracking.
Drop the variadic signature of Update and fix callers to pass in
an anonymous struct so that we are hashing concrete types
near the root of the value tree.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Formatting a time.Time as RFC3339 is slow.
See https://go.dev/issue/54093
Now that we have efficient hashing of fixed-width integers,
just hash the time.Time as a binary value.
Performance:
Hash-24 19.0µs ± 1% 18.6µs ± 1% -2.03% (p=0.000 n=10+9)
TailcfgNode-24 1.79µs ± 1% 1.40µs ± 1% -21.74% (p=0.000 n=10+9)
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Switch deephash to use sha256x.Hash.
We add sha256x.HashString to efficiently hash a string.
It uses unsafe under the hood to convert a string to a []byte.
We also modify sha256x.Hash to export the underlying hash.Hash
for testing purposes so that we can intercept all hash.Hash calls.
Performance:
name old time/op new time/op delta
Hash-24 19.8µs ± 1% 19.2µs ± 1% -3.01% (p=0.000 n=10+10)
HashPacketFilter-24 2.61µs ± 0% 2.53µs ± 1% -3.01% (p=0.000 n=8+10)
HashMapAcyclic-24 31.3µs ± 1% 29.8µs ± 0% -4.80% (p=0.000 n=10+9)
TailcfgNode-24 1.83µs ± 1% 1.82µs ± 2% ~ (p=0.305 n=10+10)
HashArray-24 344ns ± 2% 323ns ± 1% -6.02% (p=0.000 n=9+10)
The performance gains is not as dramatic as sha256x over sha256 due to:
1. most of the hashing already occurring through the direct memory hashing logic, and
2. what does not go through direct memory hashing is slowed down by reflect.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
In Go 1.19, the reflect.Value.MapRange method uses "function outlining"
so that the allocation of reflect.MapIter is inlinable by the caller.
If the iterator doesn't escape the caller, it can be stack allocated.
See https://go.dev/cl/400675
Performance:
name old time/op new time/op delta
HashMapAcyclic-24 31.9µs ± 2% 32.1µs ± 1% ~ (p=0.075 n=10+10)
name old alloc/op new alloc/op delta
HashMapAcyclic-24 0.00B 0.00B ~ (all equal)
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
The hash.Hash provided by sha256.New is much more efficient
if we always provide it with data a multiple of the block size.
This avoids double-copying of data into the internal block
of sha256.digest.x. Effectively, we are managing a block ourselves
to ensure we only ever call hash.Hash.Write with full blocks.
Performance:
name old time/op new time/op delta
Hash 33.5µs ± 1% 20.6µs ± 1% -38.40% (p=0.000 n=10+9)
The logic has gone through CPU-hours of fuzzing.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
The logic of deephash is both simpler and easier to reason about
if values are always addressable.
In Go, the composite kinds are slices, arrays, maps, structs,
interfaces, pointers, channels, and functions,
where we define "composite" as a Go value that encapsulates
some other Go value (e.g., a map is a collection of key-value entries).
In the cases of pointers and slices, the sub-values are always addressable.
In the cases of arrays and structs, the sub-values are always addressable
if and only if the parent value is addressable.
In the case of maps and interfaces, the sub-values are never addressable.
To make them addressable, we need to copy them onto the heap.
For the purposes of deephash, we do not care about channels and functions.
For all non-composite kinds (e.g., strings and ints), they are only addressable
if obtained from one of the composite kinds that produce addressable values
(i.e., pointers, slices, addressable arrays, and addressable structs).
A non-addressible, non-composite kind can be made addressable by
allocating it on the heap, obtaining a pointer to it, and dereferencing it.
Thus, if we can ensure that values are addressable at the entry points,
and shallow copy sub-values whenever we encounter an interface or map,
then we can ensure that all values are always addressable and
assume such property throughout all the logic.
Performance:
name old time/op new time/op delta
Hash-24 21.5µs ± 1% 19.7µs ± 1% -8.29% (p=0.000 n=9+9)
HashPacketFilter-24 2.61µs ± 1% 2.62µs ± 0% +0.29% (p=0.037 n=10+9)
HashMapAcyclic-24 30.8µs ± 1% 30.9µs ± 1% ~ (p=0.400 n=9+10)
TailcfgNode-24 1.84µs ± 1% 1.84µs ± 2% ~ (p=0.928 n=10+10)
HashArray-24 324ns ± 2% 332ns ± 2% +2.45% (p=0.000 n=10+10)
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
The Do function assists in calling functions that must succeed.
It only interacts well with functions that return (T, err).
Signatures with more return arguments are not supported.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
We have very similar code in corp, moving it to util/precompress allows
it to be reused.
Updates #5133
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
Clients may have platform-specific metrics they would like uploaded
(e.g. extracted from MetricKit on iOS). Add a new local API endpoint
that allows metrics to be updated by a simple name/value JSON-encoded
struct.
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
And rewrite cloud detection to try to do only zero or one metadata
discovery request for all clouds, only doing a first (or second) as
confidence increases. Work remains for Windows, but a start.
And add Cloud to tailcfg.Hostinfo, which helped with testing using
"tailcfg debug hostinfo".
Updates #4983 (Linux only)
Updates #4984
Change-Id: Ib03337089122ce0cb38c34f724ba4b4812bc614e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
And remove the GCP special-casing from ipn/ipnlocal; do it only in the
forwarder for *.internal.
Fixes#4980Fixes#4981
Change-Id: I5c481e96d91f3d51d274a80fbd37c38f16dfa5cb
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This does three things:
* If you're on GCP, it adds a *.internal DNS split route to the
metadata server, so we never break GCP DNS names. This lets people
have some Tailscale nodes on GCP and some not (e.g. laptops at home)
without having to add a Tailnet-wide *.internal DNS route.
If you already have such a route, though, it won't overwrite it.
* If the 100.100.100.100 DNS forwarder has nowhere to forward to,
it forwards it to the GCP metadata IP, which forwards to 8.8.8.8.
This means there are never errNoUpstreams ("upstream nameservers not set")
errors on GCP due to e.g. mangled /etc/resolv.conf (GCP default VMs
don't have systemd-resolved, so it's likely a DNS supremacy fight)
* makes the DNS fallback mechanism use the GCP metadata IP as a
fallback before our hosted HTTP-based fallbacks
I created a default GCP VM from their web wizard. It has no
systemd-resolved.
I then made its /etc/resolv.conf be empty and deleted its GCP
hostnames in /etc/hosts.
I then logged in to a tailnet with no global DNS settings.
With this, tailscaled writes /etc/resolv.conf (direct mode, as no
systemd-resolved) and sets it to 100.100.100.100, which then has
regular DNS via the metadata IP and *.internal DNS via the metadata IP
as well. If the tailnet configures explicit DNS servers, those are used
instead, except for *.internal.
This also adds a new util/cloudenv package based on version/distro
where the cloud type is only detected once. We'll likely expand it in
the future for other clouds, doing variants of this change for other
popular cloud environments.
Fixes#4911
RELNOTES=Google Cloud DNS improvements
Change-Id: I19f3c2075983669b2b2c0f29a548da8de373c7cf
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
(breaking up parts of another change)
This adds a PacketFilter hashing benchmark with an input that both
contains every possible field, but also is somewhat representative in
the shape of what real packet filters contain.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Regression from 09afb8e35b, in which the
same reflect.Value scratch value was being used as the map iterator
copy destination.
Also: make nil and empty maps hash differently, add test.
Fixes#4871
Co-authored-by: Josh Bleecher Snyder <josharian@gmail.com>
Change-Id: I67f42524bc81f694c1b7259d6682200125ea4a66
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
AFAICT this isn't documented on MSDN, but based on the issue referenced below,
NRPT rules are not working when a rule specifies > 50 domains.
This patch modifies our NRPT rule generator to split the list of domains
into chunks as necessary, and write a separate rule for each chunk.
For compatibility reasons, we continue to use the hard-coded rule ID, but
as additional rules are required, we generate new GUIDs. Those GUIDs are
stored under the Tailscale registry path so that we know which rules are ours.
I made some changes to winutils to add additional helper functions in support
of both the code and its test: I added additional registry accessors, and also
moved some token accessors from paths to util/winutil.
Fixes https://github.com/tailscale/coral/issues/63
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
I wrote this code way back at the beginning of my tenure at Tailscale when we
had concerns about needing to restore deleted machine keys from backups.
We never ended up using this functionality, and the code is now getting in the
way, so we might as well remove it.
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
The prefix is a signal to tsweb to treat this as a gauge metric when
generating the Prometheus version.
Signed-off-by: Mihai Parparita <mihai@tailscale.com>