Still very much a prototype (hard-coded IPs, etc) but should be
non-invasive enough to submit at this point and iterate from here.
Updates #2589
Co-Author: David Crawshaw <crawshaw@tailscale.com>
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Prior to Tailscale 1.12 it detected UPnP on any port.
Starting with Tailscale 1.11.x, it stopped detecting UPnP on all ports.
Then start plumbing its discovered Location header port number to the
code that was assuming port 5000.
Fixes#2109
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This was the proximate cause of #2579.
#2582 is a deeper fix, but this will remain
as a footgun, so may as well fix it too.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The index for every struct field or slice element and
the number of fields for the struct is unncessary.
The hashing of Go values is unambiguous because every type (except maps)
encodes in a parsable manner. So long as we know the type information,
we could theoretically decode every value (except for maps).
At a high level:
* numbers are encoded as fixed-width records according to precision.
* strings (and AppendTo output) are encoded with a fixed-width length,
followed by the contents of the buffer.
* slices are prefixed by a fixed-width length, followed by the encoding
of each value. So long as we know the type of each element, we could
theoretically decode each element.
* arrays are encoded just like slices, but elide the length
since it is determined from the Go type.
* maps are encoded first with a byte indicating whether it is a cycle.
If a cycle, it is followed by a fixed-width index for the pointer,
otherwise followed by the SHA-256 hash of its contents. The encoding of maps
is not decodeable, but a SHA-256 hash is sufficient to avoid ambiguities.
* interfaces are encoded first with a byte indicating whether it is nil.
If not nil, it is followed by a fixed-width index for the type,
and then the encoding for the underlying value. Having the type be encoded
first ensures that the value could theoretically be decoded next.
* pointers are encoded first with a byte indicating whether it is
1) nil, 2) a cycle, or 3) newly seen. If a cycle, it is followed by
a fixed-width index for the pointer. If newly seen, it is followed by
the encoding for the pointed-at value.
Removing unnecessary details speeds up hashing:
name old time/op new time/op delta
Hash-8 76.0µs ± 1% 55.8µs ± 2% -26.62% (p=0.000 n=10+10)
HashMapAcyclic-8 61.9µs ± 0% 62.0µs ± 0% ~ (p=0.666 n=9+9)
TailcfgNode-8 10.2µs ± 1% 7.5µs ± 1% -26.90% (p=0.000 n=10+9)
HashArray-8 1.07µs ± 1% 0.70µs ± 1% -34.67% (p=0.000 n=10+9)
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
For testing pfSense clients "behind" pfSense on Digital Ocean where
the main interface still exists. This is easier for debugging.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Given that https://github.com/golang/go/issues/42888 is coming, this
catches most practical panics without interfering in our development
environments.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
Instead of hashing the humanly formatted forms of a number,
hash the native machine bits of the integers themselves.
There is a small performance gain for this:
name old time/op new time/op delta
Hash-8 75.7µs ± 1% 76.0µs ± 2% ~ (p=0.315 n=10+9)
HashMapAcyclic-8 63.1µs ± 3% 61.3µs ± 1% -2.77% (p=0.000 n=10+10)
TailcfgNode-8 10.3µs ± 1% 10.2µs ± 1% -1.48% (p=0.000 n=10+10)
HashArray-8 1.07µs ± 1% 1.05µs ± 1% -1.79% (p=0.000 n=10+10)
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
The swapping of bufio.Writer between hasher and mapHasher is subtle.
Just embed a hasher in mapHasher to avoid complexity here.
No notable change in performance:
name old time/op new time/op delta
Hash-8 76.7µs ± 1% 77.0µs ± 1% ~ (p=0.182 n=9+10)
HashMapAcyclic-8 62.4µs ± 1% 62.5µs ± 1% ~ (p=0.315 n=10+9)
TailcfgNode-8 10.3µs ± 1% 10.3µs ± 1% -0.62% (p=0.004 n=10+9)
HashArray-8 1.07µs ± 1% 1.06µs ± 1% -0.98% (p=0.001 n=8+9)
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
This is a simplified rate limiter geared for exactly our needs:
A fast, mono.Time-based rate limiter for use in tstun.
It was generated by stripping down the x/time/rate rate limiter
to just our needs and switching it to use mono.Time.
It removes one time.Now call per packet.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
magicsock makes multiple calls to Now per packet.
Move to mono.Now. Changing some of the calls to
use package mono has a cascading effect,
causing non-per-packet call sites to also switch.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
There's a call to Now once per packet.
Move to mono.Now.
Though the current implementation provides high precision,
we document it to be coarse, to preserve the ability
to switch to a coarse monotonic time later.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Package mono provides a fast monotonic time.
Its primary advantage is that it is fast:
It is approximately twice as fast as time.Now.
This is because time.Now uses two clock calls,
one for wall time and one for monotonic time.
We ask for the current time 4-6 times per network packet.
At ~50ns per call to time.Now, that's enough to show
up in CPU profiles.
Package mono is a first step towards addressing that.
It is designed to be a near drop-in replacement for package time.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Go 1.17 switches to a register ABI on amd64 platforms.
Part of that switch is that go and defer calls use an argument-less
closure, which allocates. This means that we have an extra
alloc in some DNS work. That's unfortunate but not a showstopper,
and I don't see a clear path to fixing it.
The other performance benefits from the register ABI will all
but certainly outweigh this extra alloc.
Fixes#2545
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The kr/pty module moved to creack/pty per the kr/pty README[1].
creack/pty brings in support for a number of OS/arch combos that
are lacking in kr/pty.
Run `go mod tidy` while here.
[1] https://github.com/kr/pty/blob/master/README.md
Signed-off-by: Aaron Bieber <aaron@bolddaemon.com>
I don't know how to get access to a real packet. Basing this commit
entirely off:
+------------+--------------+------------------------------+
| Field Name | Field Type | Description |
+------------+--------------+------------------------------+
| NAME | domain name | MUST be 0 (root domain) |
| TYPE | u_int16_t | OPT (41) |
| CLASS | u_int16_t | requestor's UDP payload size |
| TTL | u_int32_t | extended RCODE and flags |
| RDLEN | u_int16_t | length of all RDATA |
| RDATA | octet stream | {attribute,value} pairs |
+------------+--------------+------------------------------+
From https://datatracker.ietf.org/doc/html/rfc6891#section-6.1.2
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
The handoff between tstun.Wrap's Read and poll methods
is one of the per-packet hotspots. It shows up in pprof.
Making outbound buffered increases throughput.
It is hard to measure exactly how much, because the numbers
are highly variable, but I'd estimate it at about 1%,
using the best observed max throughput across three runs.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The handoff between tstun.Wrap's Read and poll methods
is one of the per-packet hotspots. It shows up in pprof.
Making outbound buffered increases throughput.
It is hard to measure exactly how much, because the numbers
are highly variable, but I'd estimate it at about 1%,
using the best observed max throughput across three runs.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Tested manually with:
$ go test -v ./net/dnscache/ -dial-test=bogusplane.dev.tailscale.com:80
Where bogusplane has three A records, only one of which works.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>