Avoid selecting an endpoint as "better" than the current endpoint if the
total latency improvement is less than 1%. This adds some hysteresis to
avoid flapping between endpoints for a minimal improvement in latency.
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: If8312e1768ea65c4b4d4e13d8de284b3825d7a73
This passes the *dnscache.Resolver down from the Direct client into the
Noise client and from there into the controlhttp client. This retains
the Resolver so that it can share state across calls instead of creating
a new resolver.
Updates #4845
Updates #6110
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ia5d6af1870f3b5b5d7dd5685d775dcf300aec7af
Every time we change `installer.sh`, run it in a few docker
containers based on different Linux distros, just as a simple test.
Also includes a few changes to the installer script itself to make
installation work in docker:
- install dnf config-manager command before running it
- run zypper in non-interactive mode
- update pacman indexes before installing packages
Updates https://github.com/tailscale/corp/issues/8952
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
I need this for a corp change where I have a set as a queue, and make a
different decisison if the set is empty.
Updates tailscale/corp#10344
Signed-off-by: James Tucker <james@tailscale.com>
Can't have a dupe when the dupe is wrong. Clearly we need to up
our spell checking game. Did anyone say AI?
Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
The action cache restore process either matches the restore key pattern
exactly, or uses a matching prefix with the most recent date.
If the restore key is an exact match, then no updates are uploaded, but
if we've just computed tests executions for more recent code then we
will likely want to use those results in future runs.
Appending run_id to the cache key will give us an always new key, and
then we will be restore a recently uploaded cache that is more likely
has a higher overlap with the code being tested.
Updates #7975
Signed-off-by: James Tucker <james@tailscale.com>
DERP doesn't support HTTP/2. If an HTTP/2 proxy was placed in front of
a DERP server requests would fail because the connection would
be initialized with HTTP/2, which the DERP client doesn't support.
Signed-off-by: Kyle Carberry <kyle@carberry.com>
This change adds a v6conn to the pinger to enable sending pings to v6
addrs.
Updates #7826
Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
We need to always specify tags when creating an AuthKey from an OAuth key.
Check for that, and reuse the `--advertise-tags` param.
Updates #7982
Signed-off-by: Maisem Ali <maisem@tailscale.com>
This fix does not seem ideal, but the test infrastructure using a local
goos doesn't seem to avoid all of the associated challenges, but is
somewhat deeply tied to the setup.
The core issue this addresses for now is that when run on Windows there
can be no code paths that attempt to use an invalid UID string, which on
Windows is described in [1].
For the goos="linux" tests, we now explicitly skip the affected
migration code if runtime.GOOS=="windows", and for the Windows test we
explicitly use the running users uid, rather than just the string
"user1". We also now make the case where a profile exists and has
already been migrated a non-error condition toward the outer API.
Updates #7876
[1] https://learn.microsoft.com/en-us/windows-server/identity/ad-ds/manage/understand-security-identifiers
Signed-off-by: James Tucker <jftucker@gmail.com>
Benchmark flags prevent test caching, so benchmarks are now executed
independently of tests.
Fixes#7975
Signed-off-by: James Tucker <james@tailscale.com>
Previously we would error out when the recording server disappeared after the in memory
buffer filled up for the io.Copy. This makes it so that we handle failing open correctly
in that path.
Updates tailscale/corp#9967
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Fixes#6784
This PR makes it so that we can persist the tailscaled state with
intelligent tiering which increases the capacity from 4kb to 8kb
Signed-off-by: Marwan Sulaiman <marwan@tailscale.com>
Looks like on some systems there's an IPv6 address, but then opening
a IPv6 UDP socket fails later. Probably some firewall. Tolerate it
better and don't crash.
To repro: check the "udp6" to something like "udp7" (something that'll
fail) and run "go run ./cmd/tailscale netcheck" on a machine with
active IPv6. It used to crash and now it doesn't.
Fixes#7949
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This adds support to try dialing out to multiple recorders each
with a 5s timeout and an overall 30s timeout. It also starts respecting
the actions `OnRecordingFailure` field if set, if it is not set
it fails open.
Updates tailscale/corp#9967
Signed-off-by: Maisem Ali <maisem@tailscale.com>
This allows control to specify how to handle situations where the recorder
isn't available.
Updates tailscale/corp#9967
Signed-off-by: Maisem Ali <maisem@tailscale.com>
A follow-up PR will start using this field after we set it in our
production DERPMap.
Updates #7925
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Idb41b79e6055dddb8944f79d91ad4a186ace98c7
On some platforms (notably macOS and iOS) we look up the default
interface to bind outgoing connections to. This is both duplicated
work and results in logspam when the default interface is not available
(i.e. when a phone has no connectivity, we log an error and thus cause
more things that we will try to upload and fail).
Fixed by passing around a netmon.Monitor to more places, so that we can
use its cached interface state.
Fixes#7850
Updates #7621
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
We are seeing indications that some devices are still getting into an
upload loop. Bump logInterval in case these devices are on slow
connections that are taking more than 3 seconds to uploads sockstats.
Updates #7719
Signed-off-by: Will Norris <will@tailscale.com>
We're using it in more and more places, and it's not really specific to
our use of Wireguard (and does more just link/interface monitoring).
Also removes the separate interface we had for it in sockstats -- it's
a small enough package (we already pull in all of its dependencies
via other paths) that it's not worth the extra complexity.
Updates #7621
Updates #7850
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
I manually tested that the code path that relaxes pipe permissions is
not executed when run with elevated priviliges, and the test also passes
in that case.
Updates #7876
Signed-off-by: James Tucker <jftucker@gmail.com>
The test is re-enabled for Windows with a relaxed time assertion.
On Windows the runtime poller currently does not have sufficient
resolution to meet the normal requirements for this test.
See https://github.com/golang/go/issues/44343 for background.
Updates #7876
Signed-off-by: James Tucker <jftucker@gmail.com>
This is a follow-up to #7905 that adds two more linters and fixes the corresponding findings. As per the previous PR, this only flags things that are "obviously" wrong, and fixes the issues found.
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I8739bdb7bc4f75666a7385a7a26d56ec13741b7c
Without this, the peer fails to do anything over the PeerAPI if it
has a masquerade address.
```
Apr 19 13:58:15 hydrogen tailscaled[6696]: peerapi: invalid request from <ip>:58334: 100.64.0.1/32 not found in self addresses
```
Updates #8020
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Found this when adding a test that does a ping over PeerAPI.
Our integration tests set up a trafficTrap to ensure that tailscaled
does not call out to the internet, and it does so via a HTTP_PROXY.
When adding a test for pings over PeerAPI, it triggered the trap and investigation
lead to the realization that we were not removing the Proxy when trying to
dial out to the PeerAPI.
Updates tailscale/corp#8020
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Exposes some internal state of the sockstats package via the C2N and
PeerAPI endpoints, so that it can be used for debugging. For now this
includes the estimated radio on percentage and a second-by-second view
of the times the radio was active.
Also fixes another off-by-one error in the radio on percentage that
was leading to >100% values (if n seconds have passed since we started
to monitor, there may be n + 1 possible seconds where the radio could
have been on).
Updates tailscale/corp#9230
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
When splitting the radio monitor usage array, we were splitting at now %
3600 to get values into chronological order. This caused the value for
the final second to be included at the beginning of the ordered slice
rather than the end. If there was activity during that final second, an
extra five seconds of high power usage would get recorded in some cases.
This could result in a final calculation of greater than 100% usage.
This corrects that by splitting values at (now+1 % 3600).
This also simplifies the percentage calculation by always rounding
values down, which is sufficient for our usage.
Signed-off-by: Will Norris <will@tailscale.com>
It's somewhat common (e.g. when a phone has no reception), and leads to
lots of logspam.
Updates #7850
Signed-off-by: Mihai Parparita <mihai@tailscale.com>