The netcheck package and the magicksock package coordinate via the
health package, but both sides have time based heuristics through
indirect dependencies. These were misaligned, so the implemented
heuristic aimed at reducing DERP moves while there is active traffic
were non-operational about 3/5ths of the time.
It is problematic to setup a good test for this integration presently,
so instead I added comment breadcrumbs along with the initial fix.
Updates #8603
Signed-off-by: James Tucker <james@tailscale.com>
This uses the fact that we've received a frame from a given DERP region
within a certain time as a signal that the region is stil present (and
thus can still be a node's PreferredDERP / home region) even if we don't
get a STUN response from that region during a netcheck.
This should help avoid DERP flaps that occur due to losing STUN probes
while still having a valid and active TCP connection to the DERP server.
RELNOTE=Reduce home DERP flapping when there's still an active connection
Updates #8603
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: If7da6312581e1d434d5c0811697319c621e187a0
Netcheck no longer performs I/O itself, instead it makes requests via
SendPacket and expects users to route reply traffic to
ReceiveSTUNPacket.
Netcheck gains a Standalone function that stands up sockets and
goroutines to implement I/O when used in a standalone fashion.
Magicsock now unconditionally routes STUN traffic to the netcheck.Client
that it hosts, and plumbs the send packet sink.
The CLI is updated to make use of the Standalone mode.
Fixes#8723
Signed-off-by: James Tucker <james@tailscale.com>
If the absolute value of the difference between the current
PreferredDERP's latency and the best latency is <= 10ms, don't change
it and instead prefer the previous value.
This is in addition to the existing hysteresis that tries to remain
on the previous DERP region if the relative improvement is small, but
handles nodes that have low latency to >1 DERP region better.
Updates #8603
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I1e34c94178f8c9a68a69921c5bc0227337514c70
This allows providing additional information to the client about how to
select a home DERP region, such as preferring a given DERP region over
all others.
Updates #8603
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I7c4a270f31d8585112fab5408799ffba5b75266f
This test was either fixed by intermediate changes or was mis-flagged as
failing during #7876 triage.
Updates #7876
Signed-off-by: James Tucker <jftucker@gmail.com>
This change adds a v6conn to the pinger to enable sending pings to v6
addrs.
Updates #7826
Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
Looks like on some systems there's an IPv6 address, but then opening
a IPv6 UDP socket fails later. Probably some firewall. Tolerate it
better and don't crash.
To repro: check the "udp6" to something like "udp7" (something that'll
fail) and run "go run ./cmd/tailscale netcheck" on a machine with
active IPv6. It used to crash and now it doesn't.
Fixes#7949
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
On some platforms (notably macOS and iOS) we look up the default
interface to bind outgoing connections to. This is both duplicated
work and results in logspam when the default interface is not available
(i.e. when a phone has no connectivity, we log an error and thus cause
more things that we will try to upload and fail).
Fixed by passing around a netmon.Monitor to more places, so that we can
use its cached interface state.
Fixes#7850
Updates #7621
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
So we're staying within the netip.Addr/AddrPort consistently and
avoiding allocs/conversions to the legacy net addr types.
Updates #5162
Change-Id: I59feba60d3de39f773e68292d759766bac98c917
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We accidentally switched to ./tool/go in
4022796484 which resulted in no longer
running Windows builds, as this is attempting to run a bash script.
I was unable to quickly fix the various tests that have regressed, so
instead I've added skips referencing #7876, which we need to back and
fix.
Updates #7262
Updates #7876
Signed-off-by: James Tucker <james@tailscale.com>
This also adds a bunch of tests for this function to ensure that we're
returning the proper IP(s) in all cases.
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I0d9d57170dbab5f2bf07abdf78ecd17e0e635399
Using log.Printf may end up being printed out to the console, which
is not desirable. I noticed this when I was investigating some client
logs with `sockstats: trace "NetcheckClient" was overwritten by another`.
That turns to be harmless/expected (the netcheck client will fall back
to the DERP client in some cases, which does its own sockstats trace).
However, the log output could be visible to users if running the
`tailscale netcheck` CLI command, which would be needlessly confusing.
Updates tailscale/corp#9230
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
If multiple Go channels have a value (or are closed), receiving from
them all in a select will nondeterministically return one of the two
arms. In this case, it's possible that the hairpin check timer will have
expired between when we start checking and before we check at all, but
the hairpin packet has already been received. In such cases, we'd
nondeterministically set report.HairPinning.
Instead, check if we have a value in our results channel first, then
select on the value and timeout channel after. Also, add a test that
catches this particular failure.
Fixes#1795
Change-Id: I842ab0bd38d66fabc6cabf2c2c1bb9bd32febf35
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Makes it cheaper/simpler to persist values, and encourages reuse of
labels as opposed to generating an arbitrary number.
Updates tailscale/corp#9230
Updates #3363
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
We have many function pointers that we replace for the duration of test and
restore it on test completion, add method to do that.
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Uses the hooks added by tailscale/go#45 to instrument the reads and
writes on the major code paths that do network I/O in the client. The
convention is to use "<package>.<type>:<label>" as the annotation for
the responsible code path.
Enabled on iOS, macOS and Android only, since mobile platforms are the
ones we're most interested in, and we are less sensitive to any
throughput degradation due to the per-I/O callback overhead (macOS is
also enabled for ease of testing during development).
For now just exposed as counters on a /v0/sockstats PeerAPI endpoint.
We also keep track of the current interface so that we can break out
the stats by interface.
Updates tailscale/corp#9230
Updates #3363
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
This updates all source files to use a new standard header for copyright
and license declaration. Notably, copyright no longer includes a date,
and we now use the standard SPDX-License-Identifier header.
This commit was done almost entirely mechanically with perl, and then
some minimal manual fixes.
Updates #6865
Signed-off-by: Will Norris <will@tailscale.com>
The derpers don't allow whitespace in the challenge.
Change-Id: I93a8b073b846b87854fba127b5c1d80db205f658
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
We had previously added this to the netcheck report in #5087 but never
copied it into the NetInfo struct. Additionally, add it to log lines so
it's visible to support.
Change-Id: Ib6266f7c6aeb2eb2a28922aeafd950fe1bf5627e
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
The Lufthansa in-flight wifi generates a synthetic 204 response to the
DERP server's /generate_204 endpoint. This PR adds a basic
challenge/response to the endpoint; something sufficiently complicated
that it's unlikely to be implemented by a captive portal. We can then
check for the expected response to verify whether we're being MITM'd.
Follow-up to #5601
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I94a68c9a16a7be7290200eea6a549b64f02ff48f
If netcheck happens before there's a derpmap.
This seems to only affect Headscale because it doesn't send a derpmap
as early?
Change-Id: I51e0dfca8e40623e04702bc9cc471770ca20d2c2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This doesn't change any behaviour for now, other than maybe running a
full netcheck more often. The intent is to start gathering data on
captive portals, and additionally, seeing this in the 'tailscale
netcheck' command should provide a bit of additional information to
users.
Updates #1634
Change-Id: I6ba08f9c584dc0200619fa97f9fde1a319f25c76
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
The io/ioutil package has been deprecated as of Go 1.16 [1]. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.
Reference: https://golang.org/doc/go1.16#ioutil
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Convert ParseResponse and Response to use netip.AddrPort instead of
net.IP and separate port.
Fixes#5281
Signed-off-by: Kris Brandow <kris.brandow@gmail.com>
This lets us distinguish "no IPv6 because the device's ISP doesn't
offer IPv6" from "IPv6 is unavailable/disabled in the OS".
Signed-off-by: David Anderson <danderson@tailscale.com>
Well, goimports actually (which adds the normal import grouping order we do)
Change-Id: I0ce1b1c03185f3741aad67c14a7ec91a838de389
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In cases where tailscale is operating behind a MITM proxy, we need to consider
that a lot more of the internals of our HTTP requests are visible and may be
used as part of authorization checks. As such, we need to 'behave' as closely
as possible to ideal.
- Some proxies do authorization or consistency checks based the on Host header
or HTTP URI, instead of just the IP/hostname/SNI. As such, we need to
construct a `*http.Request` with a valid URI everytime HTTP is going to be
used on the wire, even if its over TLS.
Aside from the singular instance in net/netcheck, I couldn't find anywhere
else a http.Request was constructed incorrectly.
- Some proxies may deny requests, typically by returning a 403 status code. We
should not consider these requests as a valid latency check, so netcheck
semantics have been updated to consider >299 status codes as a failed probe.
Signed-off-by: Tom DNetto <tom@tailscale.com>
A new package can also later record/report which knobs are checked and
set. It also makes the code cleaner & easier to grep for env knobs.
Change-Id: Id8a123ab7539f1fadbd27e0cbeac79c2e4f09751
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Treat UDP send EPERM errors as a lost UDP packet, not something super
fatal. That's just the Linux firewall preventing it from going out.
And add a leaf package net/neterror for that (and future) policy that
all three packages can share, with tests.
Updates #3619
Change-Id: Ibdb838c43ee9efe70f4f25f7fc7fdf4607ba9c1d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>