Wait 2 minutes before we start reporting battery usage. There is always
radio activity on initial startup, which gets reported as 100% high
power usage. Let that settle before we report usage data.
Updates tailscale/corp#9230
Signed-off-by: Will Norris <will@tailscale.com>
`interfaces.Tailscale()` returns all zero values when it finds no
Tailscale interface and encounters no errors. The netns package was
treating no error as a signal that it would receive a non-zero pointer
value leading to nil pointer dereference.
Observed in:
```
--- FAIL: TestGetInterfaceIndex (0.00s)
--- FAIL: TestGetInterfaceIndex/IP_and_port (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x1029eb7d8]
goroutine 7 [running]:
testing.tRunner.func1.2({0x102a691e0, 0x102bc05c0})
/Users/raggi/.cache/tailscale-go/src/testing/testing.go:1526 +0x1c8
testing.tRunner.func1()
/Users/raggi/.cache/tailscale-go/src/testing/testing.go:1529 +0x384
panic({0x102a691e0, 0x102bc05c0})
/Users/raggi/.cache/tailscale-go/src/runtime/panic.go:884 +0x204
tailscale.com/net/netns.getInterfaceIndex(0x14000073f28, 0x1028d0284?, {0x1029ef3b7, 0xa})
/Users/raggi/src/github.com/tailscale/tailscale/net/netns/netns_darwin.go:114 +0x228
tailscale.com/net/netns.TestGetInterfaceIndex.func2(0x14000138000)
/Users/raggi/src/github.com/tailscale/tailscale/net/netns/netns_darwin_test.go:37 +0x54
testing.tRunner(0x14000138000, 0x140000551b0)
/Users/raggi/.cache/tailscale-go/src/testing/testing.go:1576 +0x10c
created by testing.(*T).Run
/Users/raggi/.cache/tailscale-go/src/testing/testing.go:1629 +0x368
FAIL tailscale.com/net/netns 0.824s
```
Fixes#8064
Signed-off-by: James Tucker <jftucker@gmail.com>
This is part of an effort to clean up tailscaled initialization between
tailscaled, tailscaled Windows service, tsnet, and the mac GUI.
Updates #8036
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In the case where the exit node requires SNAT, we would SNAT all traffic not just the
traffic meant to go through the exit node. This was a result of the default route being
added to the routing table which would match basically everything.
In this case, we need to account for all peers in the routing table not just the ones
that require NAT.
Fix and add a test.
Updates tailscale/corp#8020
Signed-off-by: Maisem Ali <maisem@tailscale.com>
This change adds a v6conn to the pinger to enable sending pings to v6
addrs.
Updates #7826
Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
Looks like on some systems there's an IPv6 address, but then opening
a IPv6 UDP socket fails later. Probably some firewall. Tolerate it
better and don't crash.
To repro: check the "udp6" to something like "udp7" (something that'll
fail) and run "go run ./cmd/tailscale netcheck" on a machine with
active IPv6. It used to crash and now it doesn't.
Fixes#7949
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
On some platforms (notably macOS and iOS) we look up the default
interface to bind outgoing connections to. This is both duplicated
work and results in logspam when the default interface is not available
(i.e. when a phone has no connectivity, we log an error and thus cause
more things that we will try to upload and fail).
Fixed by passing around a netmon.Monitor to more places, so that we can
use its cached interface state.
Fixes#7850
Updates #7621
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
We're using it in more and more places, and it's not really specific to
our use of Wireguard (and does more just link/interface monitoring).
Also removes the separate interface we had for it in sockstats -- it's
a small enough package (we already pull in all of its dependencies
via other paths) that it's not worth the extra complexity.
Updates #7621
Updates #7850
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
This is a follow-up to #7905 that adds two more linters and fixes the corresponding findings. As per the previous PR, this only flags things that are "obviously" wrong, and fixes the issues found.
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I8739bdb7bc4f75666a7385a7a26d56ec13741b7c
Found this when adding a test that does a ping over PeerAPI.
Our integration tests set up a trafficTrap to ensure that tailscaled
does not call out to the internet, and it does so via a HTTP_PROXY.
When adding a test for pings over PeerAPI, it triggered the trap and investigation
lead to the realization that we were not removing the Proxy when trying to
dial out to the PeerAPI.
Updates tailscale/corp#8020
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Exposes some internal state of the sockstats package via the C2N and
PeerAPI endpoints, so that it can be used for debugging. For now this
includes the estimated radio on percentage and a second-by-second view
of the times the radio was active.
Also fixes another off-by-one error in the radio on percentage that
was leading to >100% values (if n seconds have passed since we started
to monitor, there may be n + 1 possible seconds where the radio could
have been on).
Updates tailscale/corp#9230
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
When splitting the radio monitor usage array, we were splitting at now %
3600 to get values into chronological order. This caused the value for
the final second to be included at the beginning of the ordered slice
rather than the end. If there was activity during that final second, an
extra five seconds of high power usage would get recorded in some cases.
This could result in a final calculation of greater than 100% usage.
This corrects that by splitting values at (now+1 % 3600).
This also simplifies the percentage calculation by always rounding
values down, which is sufficient for our usage.
Signed-off-by: Will Norris <will@tailscale.com>
It's somewhat common (e.g. when a phone has no reception), and leads to
lots of logspam.
Updates #7850
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
This adds an initial and intentionally minimal configuration for
golang-ci, fixes the issues reported, and adds a GitHub Action to check
new pull requests against this linter configuration.
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I8f38fbc315836a19a094d0d3e986758b9313f163
Exclude traffic with 100.100.100.100 (for IPv4) and
with fd7a:115c:a1e0::53 (for IPv6) since this traffic with the
Tailscale service running locally on the node.
This traffic never left the node.
It also happens to be a high volume amount of traffic since
DNS requests occur over UDP with each request coming from a
unique port, thus resulting in many discrete traffic flows.
Fixestailscale/corp#10554
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Redoes the approach from #5550 and #7539 to explicitly pass in the logf
function, instead of having global state that can be overridden.
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
This is a continuation of the earlier 2a67beaacf but more aggressive;
this now remembers that we failed to find the "home" router IP so we
don't try again later on the next call.
Updates #7621
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
So we're staying within the netip.Addr/AddrPort consistently and
avoiding allocs/conversions to the legacy net addr types.
Updates #5162
Change-Id: I59feba60d3de39f773e68292d759766bac98c917
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We accidentally switched to ./tool/go in
4022796484 which resulted in no longer
running Windows builds, as this is attempting to run a bash script.
I was unable to quickly fix the various tests that have regressed, so
instead I've added skips referencing #7876, which we need to back and
fix.
Updates #7262
Updates #7876
Signed-off-by: James Tucker <james@tailscale.com>
To get the tree green again for other people.
Updates #7866
Change-Id: Ibdad2e1408e5f0c97e49a148bfd77aad17c2c5e5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This also adds a bunch of tests for this function to ensure that we're
returning the proper IP(s) in all cases.
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I0d9d57170dbab5f2bf07abdf78ecd17e0e635399
This makes `omitempty` actually work, and saves bytes in each map response.
Updates tailscale/corp#8020
Signed-off-by: Maisem Ali <maisem@tailscale.com>
At the current unoptimized memory utilization of the various data structures,
100k IPv6 routes consumes in the ballpark of 3-4GiB, which risks OOMing our
386 test machine.
Until we have the optimizations to (drastically) reduce that consumption,
skip the test that bloats too much for 32-bit machines.
Signed-off-by: David Anderson <danderson@tailscale.com>
Using log.Printf may end up being printed out to the console, which
is not desirable. I noticed this when I was investigating some client
logs with `sockstats: trace "NetcheckClient" was overwritten by another`.
That turns to be harmless/expected (the netcheck client will fall back
to the DERP client in some cases, which does its own sockstats trace).
However, the log output could be visible to users if running the
`tailscale netcheck` CLI command, which would be needlessly confusing.
Updates tailscale/corp#9230
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
We use it to gate code that depends on custom Go toolchain, but it's
currently only passed in the corp runners. Add a set on OSS so that we
can catch regressions earlier.
To specifically test sockstats this required adding a build tag to
explicitly enable them -- they're normally on for iOS, macOS and Android
only, and we don't run tests on those platforms normally.
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
power state is very roughly approximated based on observed network
activity and AT&T's state transition timings for a typical 3G radio.
Updates tailscale/corp#9230
Updates #3363
Signed-off-by: Will Norris <will@tailscale.com>
This commit implements UDP offloading for Linux. GSO size is passed to
and from the kernel via socket control messages. Support is probed at
runtime.
UDP GSO is dependent on checksum offload support on the egress netdev.
UDP GSO will be disabled in the event sendmmsg() returns EIO, which is
a strong signal that the egress netdev does not support checksum
offload.
Updates tailscale/corp#8734
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Noted on #5915 TS_DEBUG_MTU was not used consistently everywhere.
Extract the default into a function that can apply this centrally and
use it everywhere.
Added envknob.Lookup{Int,Uint}Sized to make it easier to keep CodeQL
happy when using converted values.
Updates #5915
Signed-off-by: James Tucker <james@tailscale.com>
When running a SOCKS or HTTP proxy, configure the tshttpproxy package to
drop those addresses from any HTTP_PROXY or HTTPS_PROXY environment
variables.
Fixes#7407
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I6cd7cad7a609c639780484bad521c7514841764b
This adds support to make exit nodes and subnet routers work
when in scenarios where NAT is required.
It also updates the NATConfig to be generated from a `wgcfg.Config` as
that handles merging prefs with the netmap, so it has the required information
about whether an exit node is already configured and whether routes are accepted.
Updates tailscale/corp#8020
Signed-off-by: Maisem Ali <maisem@tailscale.com>
This commit updates the wireguard-go dependency to pull in fixes for
the tun package, specifically 052af4a and aad7fca.
Signed-off-by: Jordan Whited <jordan@tailscale.com>
If multiple Go channels have a value (or are closed), receiving from
them all in a select will nondeterministically return one of the two
arms. In this case, it's possible that the hairpin check timer will have
expired between when we start checking and before we check at all, but
the hairpin packet has already been received. In such cases, we'd
nondeterministically set report.HairPinning.
Instead, check if we have a value in our results channel first, then
select on the value and timeout channel after. Also, add a test that
catches this particular failure.
Fixes#1795
Change-Id: I842ab0bd38d66fabc6cabf2c2c1bb9bd32febf35
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
This adds support in tstun to utitilize the SelfNodeV4MasqAddrForThisPeer and
perform the necessary modifications to the packet as it passes through tstun.
Currently this only handles ICMP, UDP and TCP traffic.
Subnet routers and Exit Nodes are also unsupported.
Updates tailscale/corp#8020
Co-authored-by: Melanie Warrick <warrick@tailscale.com>
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Followup to #7518 to also export client metrics when the active interface
is cellular.
Updates tailscale/corp#9230
Updates #3363
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
In May 2021, Azure App Services used 172.16.x.x addresses:
```
10: eth0@if11: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
link/ether 02:42:ac:10:01:03 brd ff:ff:ff:ff:ff:ff
inet 172.16.1.3/24 brd 172.16.1.255 scope global eth0
valid_lft forever preferred_lft forever
```
Now it uses link-local:
```
2: eth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
link/ether 8a:30:1f:50:1d:23 brd ff:ff:ff:ff:ff:ff
inet 169.254.129.3/24 brd 169.254.129.255 scope global eth0
valid_lft forever preferred_lft forever
```
This is reasonable for them to choose to do, it just broke the handling in net/interfaces.
This PR proposes to:
1. Always allow link-local in LocalAddresses() if we have no better
address available.
2. Continue to make isUsableV4() conditional on an environment we know
requires it.
I don't love the idea of having to discover these environments one by
one, but I don't understand the consequences of making isUsableV4()
return true unconditionally. It makes isUsableV4() essentially always
return true and perform no function.
Fixes https://github.com/tailscale/tailscale/issues/7603
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
They're not needed for the sockstats logger, and they're somewhat
expensive to return (since they involve the creation of a map per
label). We now have a separate GetInterfaces() method that returns
them instead (which we can still use in the PeerAPI debug endpoint).
If changing sockstatlog to sample at 10,000 Hz (instead of the default
of 10Hz), the CPU usage would go up to 59% on a iPhone XS. Removing the
per-interface stats drops it to 20% (a no-op implementation of Get that
returns a fixed value is 16%).
Updates tailscale/corp#9230
Updates #3363
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
Followup to #7499 to make validation a separate function (
GetWithValidation vs. Get). This way callers that don't need it don't
pay the cost of a syscall per active TCP socket.
Also clears the conn on close, so that we don't double-count the stats.
Also more consistently uses Go doc comments for the exported API of the
sockstats package.
Updates tailscale/corp#9230
Updates #3363
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
Though not fine-grained enough to be useful for detailed analysis, we
might as well export that we gather as client metrics too, since we have
an upload/analysis pipeline for them.
clientmetric.Metric.Add is an atomic add, so it's pretty cheap to also
do per-packet.
Updates tailscale/corp#9230
Updates #3363
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
withSockStats may be called before setLinkMonitor, in which case we
don't have a populated knownInterfaces map. Since we pre-populate the
per-interface counters at creation time, we would end up with an
empty map. To mitigate this, we do an on-demand request for the list of
interfaces.
This would most often happen with the logtail instrumentation, since we
initialize it very early on.
Updates tailscale/corp#9230
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
We can use the TCP_CONNECTION_INFO getsockopt() on Darwin to get
OS-collected tx/rx bytes for TCP sockets. Since this API is not available
for UDP sockets (or on Linux/Android), we can't rely on it for actual
stats gathering.
However, we can use it to validate the stats that we collect ourselves
using read/write hooks, so that we can be more confident in them. We
do need additional hooks from the Go standard library (added in
tailscale/go#59) to be able to collect them.
Updates tailscale/corp#9230
Updates #3363
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
Makes it cheaper/simpler to persist values, and encourages reuse of
labels as opposed to generating an arbitrary number.
Updates tailscale/corp#9230
Updates #3363
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
Despite the fact that WSL configuration is still disabled by default, we
continue to log the machine's list of WSL distros as a diagnostic measure.
Unfortunately I have seen the "wsl.exe -l" command hang indefinitely. This patch
adds a (more than reasonable) 10s timeout to ensure that tailscaled does not get
stuck while executing this operation.
I also modified the Windows implementation of NewOSConfigurator to do the
logging asynchronously, since that information is not required in order to
continue starting up.
Fixes#7476
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
Conforms to RFC 1929.
To support Java HTTP clients via libtailscale, who offer no other
reliable hooks into their sockets.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
Per a packet capture provided, some gateways will reply to a UPnP
discovery packet with a UDP packet with a source port that does not come
from the UPnP port. Accept these packets with a log message.
Updates #7377
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I5d4d5b2a0275009ed60f15c20b484fe2025d094b
We were previously sending a lower-case "udp" protocol, whereas other
implementations like miniupnp send an upper-case "UDP" protocol. For
compatibility, use an upper-case protocol instead.
Updates #7377
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I4aed204f94e4d51b7a256d29917af1536cb1b70f
Some devices don't let you UPnP portmap a port below 1024, so let's just
avoid that range of ports entirely.
Updates #7377
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ib7603b1c9a019162cdc4fa21744a2cae48bb1d86
Return a mock set of interfaces and a mock gateway during this test and
verify that LikelyHomeRouterIP returns the outcome we expect. Also
verify that we return an error if there are no IPv4 addresses available.
Follow-up to #7447
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I8f06989e7f1f0bebd108861cbff17b820ed2e6e4
We have many function pointers that we replace for the duration of test and
restore it on test completion, add method to do that.
Signed-off-by: Maisem Ali <maisem@tailscale.com>
We weren't filtering out IPv6 addresses from this function, so we could
be returning an IPv4 gateway IP and an IPv6 self IP. Per the function
comments, only return IPv4 addresses for the self IP.
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: If19a4aadc343fbd4383fc5290befa0eff006799e
Now that we're using rand.Shuffle in a few locations, create a generic
shuffle function and use it instead. While we're at it, move the
interleaveSlices function to the same package for use.
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I0b00920e5b3eea846b6cedc30bd34d978a049fd3
The debug flag on tailscaled isn't available in the macOS App Store
build, since we don't have a tailscaled binary; move it to the
'tailscale debug' CLI that is available on all platforms instead,
accessed over LocalAPI.
Updates #7377
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I47bffe4461e036fab577c2e51e173f4003592ff7
Followup to #7177 to avoid adding extra dependencies to the CLI. We
instead declare an interface for the link monitor.
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
This ensures that we're trying multiple returned IPs, since the DERP
servers return the same response to all queries. This should increase
the chances that we eventually reach a working IP.
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ie8d4fb93df96da910fae49ae71bf3e402b9fdecc
Uses the hooks added by tailscale/go#45 to instrument the reads and
writes on the major code paths that do network I/O in the client. The
convention is to use "<package>.<type>:<label>" as the annotation for
the responsible code path.
Enabled on iOS, macOS and Android only, since mobile platforms are the
ones we're most interested in, and we are less sensitive to any
throughput degradation due to the per-I/O callback overhead (macOS is
also enabled for ease of testing during development).
For now just exposed as counters on a /v0/sockstats PeerAPI endpoint.
We also keep track of the current interface so that we can break out
the stats by interface.
Updates tailscale/corp#9230
Updates #3363
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
Exposes the delegated interface data added by #7248 in the debug
endpoint. I would have found it useful when working on that PR, and
it may be handy in the future as well.
Also makes the interfaces table slightly easier to parse by adding
borders to it. To make then nicer-looking, the CSP was relaxed to allow
inline styles.
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
With #6566 we added an external mechanism for getting the default
interface, and used it on macOS and iOS (see tailscale/corp#8201).
The goal was to be able to get the default physical interface even when
using an exit node (in which case the routing table would say that the
Tailscale utun* interface is the default).
However, the external mechanism turns out to be unreliable in some
cases, e.g. when multiple cellular interfaces are present/toggled (I
have occasionally gotten my phone into a state where it reports the pdp_ip1
interface as the default, even though it can't actually route traffic).
It was observed that `ifconfig -v` on macOS reports an "effective interface"
for the Tailscale utn* interface, which seems promising. By examining
the ifconfig source code, it turns out that this is done via a
SIOCGIFDELEGATE ioctl syscall. Though this is a private API, it appears
to have been around for a long time (e.g. it's in the 10.13 xnu release
at https://opensource.apple.com/source/xnu/xnu-4570.41.2/bsd/net/if_types.h.auto.html)
and thus is unlikely to go away.
We can thus use this ioctl if the routing table says that a utun*
interface is the default, and go back to the simpler mechanism that
we had before #6566.
Updates #7184
Updates #7188
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
As part of the work on #7248 I wanted to know all of the flags on the
RouteMessage struct that we get back from macOS. Though it doesn't turn
out to be useful (when using an exit node/Tailscale is the default route,
the flags for the physical interface routes are the same), it still seems
useful from a debugging/comprehensiveness perspective.
Adds additional Darwin flags that were output once I enabled this mode.
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
With #6566 we started to more aggressively bind to the default interface
on Darwin. We are seeing some reports of the wrong cellular interface
being chosen on iOS. To help with the investigation, this adds to knobs
to control the behavior changes:
- CapabilityDebugDisableAlternateDefaultRouteInterface disables the
alternate function that we use to get the default interface on macOS
and iOS (implemented in tailscale/corp#8201). We still log what it
would have returned so we can see if it gets things wrong.
- CapabilityDebugDisableBindConnToInterface is a bigger hammer that
disables binding of connections to the default interface altogether.
Updates #7184
Updates #7188
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
It was originally added to control memory use on iOS (#2490), but then
was relaxed conditionally when running on iOS 15 (#3098). Now that we
require iOS 15, there's no need for the limit at all, so simplify back
to the original state.
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
GetProxyConnectHeader (golang/go#41048) was upstreamed in Go 1.16 and
OnProxyConnectResponse (golang/go#54299) in Go 1.20, thus we no longer
need to guard their use by the tailscale_go build tag.
Updates #7123
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
Add the envknob TS_DEBUG_EXIT_NODE_DNS_NET_PKG, which enables more
verbose debug logging when calling the handleExitNodeDNSQueryWithNetPkg
function. This function is currently only called on Windows and Android.
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ieb3ca7b98837d7dc69cd9ca47609c1c52e3afd7b
When we make a connection to a server, we previously would verify with
the system roots, and then fall back to verifying with our baked-in
Let's Encrypt root if the system root cert verification failed.
We now explicitly check for, and log a health error on, self-signed
certificates. Additionally, we now always verify against our baked-in
Let's Encrypt root certificate and log an error if that isn't
successful. We don't consider this a health failure, since if we ever
change our server certificate issuer in the future older non-updated
versions of Tailscale will no longer be healthy despite being able to
connect.
Updates #3198
Change-Id: I00be5ceb8afee544ee795e3c7a2815476abc4abf
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Updates #7123
Updates #6257 (more to do in other repos)
Change-Id: I073e2a6d81a5d7fbecc29caddb7e057ff65239d0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Update all code generation tools, and those that check for license
headers to use the new standard header.
Also update copyright statement in LICENSE file.
Fixes#6865
Signed-off-by: Will Norris <will@tailscale.com>
This updates all source files to use a new standard header for copyright
and license declaration. Notably, copyright no longer includes a date,
and we now use the standard SPDX-License-Identifier header.
This commit was done almost entirely mechanically with perl, and then
some minimal manual fixes.
Updates #6865
Signed-off-by: Will Norris <will@tailscale.com>
Follow-up to #7065 with some comments from Brad's review.
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ia1219f4fa25479b2dada38ffe421065b408c5954
The API documentation does claim to output empty strings under certain
conditions, but we're sometimes seeing nil pointers in the wild, not empty
strings.
Fixes https://github.com/tailscale/corp/issues/8878
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
When turned on via environment variable (off by default), this will use
the BSD routing APIs to query what interface index a socket should be
bound to, rather than binding to the default interface in all cases.
Updates #5719
Updates #5940
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ib4c919471f377b7a08cd3413f8e8caacb29fee0b
This allows users to temporarily enable/disable dnscache logging via a
new node capability, to aid in debugging strange connectivity issues.
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I46cf2596a8ae4c1913880a78d0033f8b668edc08
I typoed/brainoed in the earlier 3582628691
Change-Id: Ic198a6f9911f195d9da9fc5259b5784a4b15e5e3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
01b90df2fa added SCTP support before
(with explicit parsing for ports) and
69de3bf7bf tried to add support for
arbitrary IP protocols (as long as the ACL permited a port of "*",
since we might not know how to find ports from an arbitrary IP
protocol, if it even has such a concept). But apparently that latter
commit wasn't tested end-to-end enough. It had a lot of tests, but the
tests made assumptions about layering that either weren't true, or
regressed since 1.20. Notably, it didn't remove the (*Filter).pre
bidirectional filter that dropped all "unknown" protocol packets both
leaving and entering, even if there were explicit protocol matches
allowing them in.
Also, don't map all unknown protocols to 0. Keep their IP protocol
number parsed so it's matchable by later layers. Only reject illegal
things.
Fixes#6423
Updates #2162
Updates #2163
Change-Id: I9659b3ece86f4db51d644f9b34df78821758842c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Gateway devices operating as an HA pair w/VRRP or CARP may send UPnP
replies from static addresses rather than the floating gateway address.
This commit relaxes our source address verification such that we parse
responses from non-gateway IPs, and re-point the UPnP root desc
URL to the gateway IP. This ensures we are still interfacing with the
gateway device (assuming L2 security intact), even though we got a
root desc from a non-gateway address.
This relaxed handling is required for ANY port mapping to work on certain
OPNsense/pfsense distributions using CARP at the time of writing, as
miniupnpd may only listen on the static, non-gateway interface address
for PCP and PMP.
Fixes#5502
Signed-off-by: Jordan Whited <jordan@tailscale.com>
With a42a594bb3, iOS uses netstack and
hence there are no longer any platforms which use the legacy MagicDNS path. As such, we remove it.
We also normalize the limit for max in-flight DNS queries on iOS (it was 64, now its 256 as per other platforms).
It was 64 for the sake of being cautious about memory, but now we have 50Mb (iOS-15 and greater) instead of 15Mb
so we have the spare headroom.
Signed-off-by: Tom DNetto <tom@tailscale.com>
Go now includes the GOROOT bin directory in $PATH while running tests
and generate, so it is no longer necessary to construct a path using
runtime.GOROOT().
Fixes#6689
Signed-off-by: James Tucker <james@tailscale.com>
We saw a few cases where we hit this limit; bumping to 4k seems
relatively uncontroversial.
Change-Id: I218fee3bc0d2fa5fde16eddc36497a73ebd7cbda
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
We change our invocations of GetExtendedTcpTable to request additional
information about the "module" responsible for the port. In addition to pid,
this output also includes sufficient metadata to enable Windows to resolve
process names and disambiguate svchost processes.
We store the OS-specific output in an OSMetadata field in netstat.Entry, which
portlist may then use as necessary to actually resolve the process/module name.
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
The Tailscale logging service has a hard limit on the maximum
log message size that can be accepted.
We want to ensure that netlog messages never exceed
this limit otherwise a client cannot transmit logs.
Move the goroutine for periodically dumping netlog messages
from wgengine/netlog to net/connstats.
This allows net/connstats to manage when it dumps messages,
either based on time or by size.
Updates tailscale/corp#8427
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Previously, if a DNS-over-TCP message was received while there were
existing queries in-flight, and it was over the size limit, we'd close
the 'responses' channel. This would cause those in-flight queries to
send on the closed channel and panic.
Instead, don't close the channel at all and rely on s.ctx being
canceled, which will ensure that in-flight queries don't hang.
Fixes#6725
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I8267728ac37ed7ae38ddd09ce2633a5824320097
This is temporary while we work to upstream performance work in
https://github.com/WireGuard/wireguard-go/pull/64. A replace directive
is less ideal as it breaks dependent code without duplication of the
directive.
Signed-off-by: Jordan Whited <jordan@tailscale.com>
We would call parsedPacketPool.Get() for all packets received in Read/Write.
This was wasteful and not necessary, fetch a single *packet.Parsed for
all packets.
Signed-off-by: Maisem Ali <maisem@tailscale.com>
This commit updates the wireguard-go dependency and implements the
necessary changes to the tun.Device and conn.Bind implementations to
support passing vectors of packets in tailscaled. This significantly
improves throughput performance on Linux.
Updates #414
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: James Tucker <james@tailscale.com>
Co-authored-by: James Tucker <james@tailscale.com>
x/exp/slices now has ContainsFunc (golang/go#53983) so we can delete
our versions.
Change-Id: I5157a403bfc1b30e243bf31c8b611da25e995078
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We were previously only doing this for tailscaled-on-Darwin, but it also
appears to help on iOS. Otherwise, when we rebind magicsock UDP
connections after a cellular -> WiFi interface change they still keep
using cellular one.
To do this correctly when using exit nodes, we need to exclude the
Tailscale interface when getting the default route, otherwise packets
cannot leave the tunnel. There are native macOS/iOS APIs that we can
use to do this, so we allow those clients to override the implementation
of DefaultRouteInterfaceIndex.
Updates #6565, may also help with #5156
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
I couldn't find any logs that indicated which mode it was running in so adding that.
Also added a gauge metric for dnsMode.
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Previously, tstun.Wrapper and magicsock.Conn managed their
own statistics data structure and relied on an external call to
Extract to extract (and reset) the statistics.
This makes it difficult to ensure a maximum size on the statistics
as the caller has no introspection into whether the number
of unique connections is getting too large.
Invert the control flow such that a *connstats.Statistics
is registered with tstun.Wrapper and magicsock.Conn.
Methods on non-nil *connstats.Statistics are called for every packet.
This allows the implementation of connstats.Statistics (in the future)
to better control when it needs to flush to ensure
bounds on maximum sizes.
The value registered into tstun.Wrapper and magicsock.Conn could
be an interface, but that has two performance detriments:
1. Method calls on interface values are more expensive since
they must go through a virtual method dispatch.
2. The implementation would need a sync.Mutex to protect the
statistics value instead of using an atomic.Pointer.
Given that methods on constats.Statistics are called for every packet,
we want reduce the CPU cost on this hot path.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
The fix in 4fc8538e2 was sufficient for IPv6. Browsers (can?) send the
IPv6 literal, even without a port number, in brackets.
Updates tailscale/corp#7948
Change-Id: I0e429d3de4df8429152c12f251ab140b0c8f6b77
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
There was a mechanism in tshttpproxy to note that a Windows proxy
lookup failed and to stop hitting it so often. But that turns out to
fire a lot (no PAC file configured at all results in a proxy lookup),
so after the first proxy lookup, we were enabling the "omg something's
wrong, stop looking up proxies" bit for awhile, which was then also
preventing the normal Go environment-based proxy lookups from working.
This at least fixes environment-based proxies.
Plenty of other Windows-specific proxy work remains (using
WinHttpGetIEProxyConfigForCurrentUser instead of just PAC files,
ignoring certain types of errors, etc), but this should fix
the regression reported in #4811.
Updates #4811
Change-Id: I665e1891897d58e290163bda5ca51a22a017c5f9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Run an inotify goroutine and watch if another program takes over
/etc/inotify.conf. Log if so.
For now this only logs. In the future I want to wire it up into the
health system to warn (visible in "tailscale status", etc) about the
situation, with a short URL to more info about how you should really
be using systemd-resolved if you want programs to not fight over your
DNS files on Linux.
Updates #4254 etc etc
Change-Id: I86ad9125717d266d0e3822d4d847d88da6a0daaa
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Noticed this while debugging something else, we would reset all routes if
either `--advertise-exit-node` or `--advertise-routes` were set. This handles
correctly updating them.
Also added tests.
Signed-off-by: Maisem Ali <maisem@tailscale.com>
The derpers don't allow whitespace in the challenge.
Change-Id: I93a8b073b846b87854fba127b5c1d80db205f658
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
The //go:build syntax was introduced in Go 1.17:
https://go.dev/doc/go1.17#build-lines
gofmt has kept the +build and go:build lines in sync since
then, but enough time has passed. Time to remove them.
Done with:
perl -i -npe 's,^// \+build.*\n,,' $(git grep -l -F '+build')
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
To collect some data on how widespread this is and whether there's
any correlation between different versions of Windows, etc.
Updates #4811
Change-Id: I003041d0d7e61d2482acd8155c1a4ed413a2c5c4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Instead of returning a custom error, use ErrGetBaseConfigNotSupported
that seems to be intended for this use case. This fixes DNS resolution
on macOS clients compiled from source.
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
The netlog.Message type is useful to depend on from other packages,
but doing so would transitively cause gvisor and other large packages
to be linked in.
Avoid this problem by moving all network logging types to a single package.
We also update staticcheck to take in:
003d277bcf
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
It looks like this was left by mistake in 4a3e2842.
Change-Id: Ie4e3d5842548cd2e8533b3552298fb1ce9ba761a
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
On Android, the system resolver can return IPv4 addresses as IPv6-mapped
addresses (i.e. `::ffff:a.b.c.d`). After the switch to `net/netip`
(19008a3), this case is no longer handled and a response like this will
be seen as failure to resolve any IPv4 addresses.
Handle this case by simply calling `Unmap()` on the returned IPs. Fixes#5698.
Signed-off-by: Peter Cai <peter@typeblog.net>
We had previously added this to the netcheck report in #5087 but never
copied it into the NetInfo struct. Additionally, add it to log lines so
it's visible to support.
Change-Id: Ib6266f7c6aeb2eb2a28922aeafd950fe1bf5627e
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
The ResolvConfMode property is documented to return how systemd-resolved
is currently managing /etc/resolv.conf. Include that information in the
debug line, when available, to assist in debugging DNS issues.
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I1ae3a257df1d318d0193a8c7f135c458ec45093e
The Lufthansa in-flight wifi generates a synthetic 204 response to the
DERP server's /generate_204 endpoint. This PR adds a basic
challenge/response to the endpoint; something sufficiently complicated
that it's unlikely to be implemented by a captive portal. We can then
check for the expected response to verify whether we're being MITM'd.
Follow-up to #5601
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I94a68c9a16a7be7290200eea6a549b64f02ff48f
Instead of treating any interface with a non-ifscope route as a
potential default gateway, now verify that a given route is
actually a default route (0.0.0.0/0 or ::/0).
Fixes#5879
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
We removed it in #4806 in favor of the built-in functionality from the
nhooyr.io/websocket package. However, it has an issue with deadlines
that has not been fixed yet (see nhooyr/websocket#350). Temporarily
go back to using a custom wrapper (using the fix from our fork) so that
derpers will stop closing connections too aggressively.
Updates #5921
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
If netcheck happens before there's a derpmap.
This seems to only affect Headscale because it doesn't send a derpmap
as early?
Change-Id: I51e0dfca8e40623e04702bc9cc471770ca20d2c2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The Logger type managers a logtail.Logger for extracting
statistics from a tstun.Wrapper.
So long as Shutdown is called, it ensures that logtail
and statistic gathering resources are properly cleared up.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Rename StatisticsEnable as SetStatisticsEnabled to be consistent
with other similarly named methods.
Rename StatisticsExtract as ExtractStatistics to follow
the convention where methods start with a verb.
It was originally named with Statistics as a prefix so that
statistics related methods would sort well in godoc,
but that property no longer holds.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
If Wrapper.StatisticsEnable is enabled,
then per-connection counters are maintained.
If enabled, Wrapper.StatisticsExtract must be periodically called
otherwise there is unbounded memory growth.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
High-level API:
type Statistics struct { ... }
type Counts struct { TxPackets, TxBytes, RxPackets, RxBytes uint64 }
func (*Statistics) UpdateTx([]byte)
func (*Statistics) UpdateRx([]byte)
func (*Statistics) Extract() map[flowtrack.Tuple]Counts
The API accepts a []byte instead of a packet.Parsed so that a future
implementation can directly hash the address and port bytes,
which are contiguous in most IP packets.
This will be useful for a custom concurrent-safe hashmap implementation.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Most visible when using tsnet.Server, but could have resulted in dropped
messages in a few other places too.
Fixes#5743
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
I brain-o'ed the math earlier. The NextDNS prefix is /32 (actually
/33, but will guarantee last bit is 0), so we have 128-32 = 96 bits
(12 bytes) of config/profile ID that we can extract. NextDNS doesn't
currently use all those, but might.
Updates #2452
Change-Id: I249bd28500c781e45425fd00fd3f46893ae226a2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
- removed some in-flow time calls
- increase buffer size to 2MB to overcome syscall cost
- move relative time computation from record to report time
Signed-off-by: James Tucker <james@tailscale.com>
The fragment offset is an 8 byte offset rather than a byte offset, so
the short packet limit is now in fragment block size in order to compare
with the offset value.
The packet flags are in the first 3 bits of the flags/frags byte, and
so after conversion to a uint16 little endian value they are at the
start, not the end of the value - the mask for extracting "more
fragments" is adjusted to match this byte.
Extremely short fragments less than 80 bytes are dropped, but fragments
over 80 bytes are now accepted.
Fixes#5727
Signed-off-by: James Tucker <james@tailscale.com>
This doesn't change any behaviour for now, other than maybe running a
full netcheck more often. The intent is to start gathering data on
captive portals, and additionally, seeing this in the 'tailscale
netcheck' command should provide a bit of additional information to
users.
Updates #1634
Change-Id: I6ba08f9c584dc0200619fa97f9fde1a319f25c76
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
The io/ioutil package has been deprecated as of Go 1.16 [1]. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.
Reference: https://golang.org/doc/go1.16#ioutil
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
The plan has changed. Doing query parameters rather than path +
heades. NextDNS added support for query parameters.
Updates #2452
Change-Id: I4783c0a06d6af90756d9c80a7512644ba702388c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
For debugging a macOS-specific magicsock issue. macOS runs in
bind-to-interface mode always. This lets me force Linux into the same
mode as macOS, even if the Linux kernel supports SO_MARK, as it
usually does.
Updates #2331 etc
Change-Id: Iac9e4a7429c1781337e716ffc914443b7aa2869d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Clarify & verify that some DoH URLs can be sent over tailcfg
in some limited cases.
Updates #2452
Change-Id: Ibb25db77788629c315dc26285a1059a763989e24
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
NextDNS is unique in that users create accounts and then get
user-specific DNS IPs & DoH URLs.
For DoH, the customer ID is in the URL path.
For IPv6, the IP address includes the customer ID in the lower bits.
For IPv4, there's a fragile "IP linking" mechanism to associate your
public IPv4 with an assigned NextDNS IPv4 and that tuple maps to your
customer ID.
We don't use the IP linking mechanism.
Instead, NextDNS is DoH-only. Which means using NextDNS necessarily
shunts all DNS traffic through 100.100.100.100 (programming the OS to
use 100.100.100.100 as the global resolver) because operating systems
can't usually do DoH themselves.
Once it's in Tailscale's DoH client, we then connect out to the known
NextDNS IPv4/IPv6 anycast addresses.
If the control plane sends the client a NextDNS IPv6 address, we then
map it to the corresponding NextDNS DoH with the same client ID, and
we dial that DoH server using the combination of v4/v6 anycast IPs.
Updates #2452
Change-Id: I3439d798d21d5fc9df5a2701839910f5bef85463
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This is especially helpful as we launch newer DERPs over time, and older
clients have progressively out-of-date static DERP maps baked in. After
this, as long as the client has successfully connected once, it'll cache
the most recent DERP map it knows about.
Resolves an in-code comment from @bradfitz
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
If ExtraRecords (Hosts) are specified without a corresponding split
DNS route and global DNS is specified, then program the host OS DNS to
use 100.100.100.100 so it can blend in those ExtraRecords.
Updates #1543
Change-Id: If49014a5ecc8e38978ff26e54d1f74fe8dbbb9bc
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Somehow I accidentally set the wrong registry value here.
It should be DisableDynamicUpdate=1 and not EnableDNSUpdate=0.
This is a regression from 545639e.
Signed-off-by: Maisem Ali <maisem@tailscale.com>
See https://mullvad.net/en/help/dns-over-https-and-dns-over-tls/
The Mullvad DoH servers appear to only speak HTTP/2 and
the use of a non-nil DialContext in the http.Transport
means that ForceAttemptHTTP2 must be set to true to be
able to use them.
Signed-off-by: Nahum Shalman <nahamu@gmail.com>
This works around the 2.3s delay in short name lookups when SNR is
enabled.
C:\Windows\System32\drivers\etc\hosts file. We only add known hosts that
match the search domains, and we populate the list in order of
Search Domains so that our matching algorithm mimics what Windows would
otherwise do itself if SNR was off.
Updates #1659
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Convert ParseResponse and Response to use netip.AddrPort instead of
net.IP and separate port.
Fixes#5281
Signed-off-by: Kris Brandow <kris.brandow@gmail.com>
Like LLMNR, NetBIOS also adds resolution delays and we don't support it
anyway so just disable it on the interface.
Updates #1659
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Currently we forward unmatched queries to the default resolver on
Windows. This results in duplicate queries being issued to the same
resolver which is just wasted.
Updates #1659
Signed-off-by: Maisem Ali <maisem@tailscale.com>