Previously, if we had a umask set (e.g. 0027) that prevented creating a
world-readable file, /etc/resolv.conf would be created without the o+r
bit and thus other users may be unable to resolve DNS.
Since a umask only applies to file creation, chmod the file after
creation and before renaming it to ensure that it has the appropriate
permissions.
Updates #12609
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I2a05d64f4f3a8ee8683a70be17a7da0e70933137
Fixestailscale/corp#20677
Replaces the original attempt to rectify this (by injecting a netMon
event) which was both heavy handed, and missed cases where the
netMon event was "minor".
On apple platforms, the fetching the interface's nameservers can
and does return an empty list in certain situations. Apple's API
in particular is very limiting here. The header hints at notifications
for dns changes which would let us react ahead of time, but it's all
private APIs.
To avoid remaining in the state where we end up with no
nameservers but we absolutely need them, we'll react
to a lack of upstream nameservers by attempting to re-query
the OS.
We'll rate limit this to space out the attempts. It seems relatively
harmless to attempt a reconfig every 5 seconds (triggered
by an incoming query) if the network is in this broken state.
Missing nameservers might possibly be a persistent condition
(vs a transient error), but that would also imply that something
out of our control is badly misconfigured.
Tested by randomly returning [] for the nameservers. When switching
between Wifi networks, or cell->wifi, this will randomly trigger
the bug, and we appear to reliably heal the DNS state.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
This is implemented via GetBestInterfaceEx. Should we encounter errors
or fail to resolve a valid, non-Tailscale interface, we fall back to
returning the index for the default interface instead.
Fixes#12551
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
I meant to do this in the earlier change and had a git fail.
To atone, add a test too while I'm here.
Updates #12486
Updates #12507
Change-Id: I4943b454a2530cb5047636f37136aa2898d2ffc7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I noticed we were allocating these every time when they could just
share the same memory. Rather than document ownership, just lock it
down with a view.
I was considering doing all of the fields but decided to just do this
one first as test to see how infectious it became. Conclusion: not
very.
Updates #cleanup (while working towards tailscale/corp#20514)
Change-Id: I8ce08519de0c9a53f20292adfbecd970fe362de0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
For pprof cosmetic/confusion reasons more than performance, but it
might have tiny speed benefit.
Updates #12486
Change-Id: I40e03714f3afa3a7e7f5e1fa99b81c7e889b91b6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
So profiles show more useful names than just func1, func2, func3, etc.
There will still be func1 on them all, but the symbol before will say
what the lookup type is.
Updates #12486
Change-Id: I910b024a7861394eb83d07f5a899eae338cb1f22
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This moves NewContainsIPFunc from tsaddr to new ipset package.
And wgengine/filter types gets split into wgengine/filter/filtertype,
so netmap (and thus the CLI, etc) doesn't need to bring in ipset,
bart, etc.
Then add a test making sure the CLI deps don't regress.
Updates #1278
Change-Id: Ia246d6d9502bbefbdeacc4aef1bed9c8b24f54d5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
NewContainsIPFunc was previously documented as performing poorly if
there were many netip.Prefixes to search over. As such, we never it used it
in such cases.
This updates it to use bart at a certain threshold (over 6 prefixes,
currently), at which point the bart lookup overhead pays off.
This is currently kinda useless because we're not using it. But now we
can and get wins elsewhere. And we can remove the caveat in the docs.
goos: darwin
goarch: arm64
pkg: tailscale.com/net/tsaddr
│ before │ after │
│ sec/op │ sec/op vs base │
NewContainsIPFunc/empty-8 2.215n ± 11% 2.239n ± 1% +1.08% (p=0.022 n=10)
NewContainsIPFunc/cidr-list-1-8 17.44n ± 0% 17.59n ± 6% +0.89% (p=0.000 n=10)
NewContainsIPFunc/cidr-list-2-8 27.85n ± 0% 28.13n ± 1% +1.01% (p=0.000 n=10)
NewContainsIPFunc/cidr-list-3-8 36.05n ± 0% 36.56n ± 13% +1.41% (p=0.000 n=10)
NewContainsIPFunc/cidr-list-4-8 43.73n ± 0% 44.38n ± 1% +1.50% (p=0.000 n=10)
NewContainsIPFunc/cidr-list-5-8 51.61n ± 2% 51.75n ± 0% ~ (p=0.101 n=10)
NewContainsIPFunc/cidr-list-10-8 95.65n ± 0% 68.92n ± 0% -27.94% (p=0.000 n=10)
NewContainsIPFunc/one-ip-8 4.466n ± 0% 4.469n ± 1% ~ (p=0.491 n=10)
NewContainsIPFunc/two-ip-8 8.002n ± 1% 7.997n ± 4% ~ (p=0.697 n=10)
NewContainsIPFunc/three-ip-8 27.98n ± 1% 27.75n ± 0% -0.82% (p=0.012 n=10)
geomean 19.60n 19.07n -2.71%
Updates #12486
Change-Id: I2e2320cc4384f875f41721374da536bab995c1ce
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Without this rule, Windows 8.1 and newer devices issue parallel DNS requests to DNS servers
associated with all network adapters, even when "Override local DNS" is enabled and/or
a Mullvad exit node is being used, resulting in DNS leaks.
This also adds "disable-local-dns-override-via-nrpt" nodeAttr that can be used to disable
the new behavior if needed.
Fixestailscale/corp#20718
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Updates tailscale/tailscale#4136
This PR is the first round of work to move from encoding health warnings as strings and use structured data instead. The current health package revolves around the idea of Subsystems. Each subsystem can have (or not have) a Go error associated with it. The overall health of the backend is given by the concatenation of all these errors.
This PR polishes the concept of Warnable introduced by @bradfitz a few weeks ago. Each Warnable is a component of the backend (for instance, things like 'dns' or 'magicsock' are Warnables). Each Warnable has a unique identifying code. A Warnable is an entity we can warn the user about, by setting (or unsetting) a WarningState for it. Warnables have:
- an identifying Code, so that the GUI can track them as their WarningStates come and go
- a Title, which the GUIs can use to tell the user what component of the backend is broken
- a Text, which is a function that is called with a set of Args to generate a more detailed error message to explain the unhappy state
Additionally, this PR also begins to send Warnables and their WarningStates through LocalAPI to the clients, using ipn.Notify messages. An ipn.Notify is only issued when a warning is added or removed from the Tracker.
In a next PR, we'll get rid of subsystems entirely, and we'll start using structured warnings for all errors affecting the backend functionality.
Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
This commit introduces a userspace program for managing an experimental
eBPF XDP STUN server program. derp/xdp contains the eBPF pseudo-C along
with a Go pkg for loading it and exporting its metrics.
cmd/xdpderper is a package main user of derp/xdp.
Updates tailscale/corp#20689
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Fixestailscale/corp#20677
On macOS sleep/wake, we're encountering a condition where reconfigure the network
a little bit too quickly - before apple has set the nameservers for our interface.
This results in a persistent condition where we have no upstream resolver and
fail all forwarded DNS queries.
No upstream nameservers is a legitimate configuration, and we have no (good) way
of determining when Apple is ready - but if we need to forward a query, and we
have no nameservers, then something has gone badly wrong and the network is
very broken.
A simple fix here is to simply inject a netMon event, which will go through the
configuration dance again when we hit the SERVFAIL condition.
Tested by artificially/randomly returning [] for the list of nameservers in the bespoke
ipn-bridge code responsible for getting the nameservers.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
As an alterative to #11935 using #12003.
Updates #11935
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I05f643fe812ceeaec5f266e78e3e529cab3a1ac3
When we're starting child processes on Windows that are CLI programs that
don't need to output to a console, we should pass in DETACHED_PROCESS as a
CreationFlag on SysProcAttr. This prevents the OS from even creating a console
for the child (and paying the associated time/space penalty for new conhost
processes). This is more efficient than letting the OS create the console
window and then subsequently trying to hide it, which we were doing at a few
callsites.
Fixes#12270
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
As quad-100 is an authoritative server for 4via6 domains, it should always return responses
with a response code of 0 (indicating no error) when resolving records for these domains.
If there's no resource record of the specified type (e.g. A), it should return a response
with an empty answer section rather than NXDomain. Such a response indicates that there
is at least one RR of a different type (e.g., AAAA), suggesting the Windows stub resolver
to look for it.
Fixestailscale/corp#20767
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Add a new TS_EXPERIMENTAL_ENABLE_FORWARDING_OPTIMIZATIONS env var
that can be set for tailscale/tailscale container running as
a subnet router or exit node to enable UDP GRO forwarding
for improved performance.
See https://tailscale.com/kb/1320/performance-best-practices#linux-optimizations-for-subnet-routers-and-exit-nodes
This is currently considered an experimental approach;
the configuration support is partially to allow further experimentation
with containerized environments to evaluate the performance
improvements.
Updates tailscale/tailscale#12295
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Updates corp#15802.
Adds the ability for control to disable the recently added change that uses split DNS in more cases on iOS. This will allow us to disable the feature if it leads to regression in production. We plan to remove this knob once we've verified that the feature works properly.
Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
This bug was introduced in e6b84f215 (May 2020) but was only used in
tests when stringifying probeProto values on failure so it wasn't
noticed for a long time.
But then it was moved into non-test code in 8450a18aa (Jun 2024) and I
didn't notice during the code movement that it was wrong. It's still
only used in failure paths in logs, but having wrong/ambiguous
debugging information isn't the best.
Whoops.
Updates tailscale/corp#20654
Change-Id: I296c727ed1c292a04db7b46ecc05c07fc1abc774
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This adds a new ListenPacket function on tsnet.Server
which acts mostly like `net.ListenPacket`.
Unlike `Server.Listen`, this requires listening on a
specific IP and does not automatically listen on both
V4 and V6 addresses of the Server when the IP is unspecified.
To test this, it also adds UDP support to tsdial.Dialer.UserDial
and plumbs it through the localapi. Then an associated test
to make sure the UDP functionality works from both sides.
Updates #12182
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Updates https://github.com/tailscale/corp/issues/15802.
On iOS exclusively, this PR adds logic to use a split DNS configuration in more cases, with the goal of improving battery life. Acting as the global DNS resolver on iOS should be avoided, as it leads to frequent wakes of IPNExtension.
We try to determine if we can have Tailscale only handle DNS queries for resources inside the tailnet, that is, all routes in the DNS configuration do not require a custom resolver (this is the case for app connectors, for instance).
If so, we set all Routes as MatchDomains. This enables a split DNS configuration which will help preserve battery life. Effectively, for the average Tailscale user who only relies on MagicDNS to resolve *.ts.net domains, this means that Tailscale DNS will only be used for those domains.
This PR doesn't affect users with Override Local DNS enabled. For these users, there should be no difference and Tailscale will continue acting as a global DNS resolver.
Signed-off-by: Andrea Gottardo <andrea@tailscale.com>
Palo Alto reported interpreting hairpin probes as LAND attacks, and the
firewalls may be responding to this by shutting down otherwise in use NAT sessions
prematurely. We don't currently make use of the outcome of the hairpin
probes, and they contribute to other user confusion with e.g. the
AirPort Extreme hairpin session workaround. We decided in response to
remove the whole probe feature as a result.
Updates #188
Updates tailscale/corp#19106
Updates tailscale/corp#19116
Signed-off-by: James Tucker <james@tailscale.com>
Palo Alto firewalls have a typically hard NAT, but also have a mode
called Persistent DIPP that is supposed to provide consistent port
mapping suitable for STUN resolution of public ports. Persistent DIPP
works initially on most Palo Alto firewalls, but some models/software
versions have a bug which this works around.
The bug symptom presents as follows:
- STUN sessions resolve a consistent public IP:port to start with
- Much later netchecks report the same IP:Port for a subset of
sessions, most often the users active DERP, and/or the port related
to sustained traffic.
- The broader set of DERPs in a full netcheck will now consistently
observe a new IP:Port.
- After this point of observation, new inbound connections will only
succeed to the new IP:Port observed, and existing/old sessions will
only work to the old binding.
In this patch we now advertise the lowest latency global endpoint
discovered as we always have, but in addition any global endpoints that
are observed more than once in a single netcheck report. This should
provide viable endpoints for potential connection establishment across
a NAT with this behavior.
Updates tailscale/corp#19106
Signed-off-by: James Tucker <james@tailscale.com>
In this commit I updated the Ipv6 range we use to generate Control D DOH ip, we were using the NextDNSRanges to generate Control D DOH ip, updated to use the correct range.
Updates: #7946
Signed-off-by: Kevin Liang <kevinliang@tailscale.com>
In a configuration where the local node (ip1) has a different IP (ip2)
that it uses to communicate with a peer (ip3) we would do UDP flow
tracking on the `ip2->ip3` tuple. When we receive the response from
the peer `ip3->ip2` we would dnat it back to `ip3->ip1` which would
then not match the flow track state and the packet would get dropped.
To fix this, we should do flow tracking on the `ip1->ip3` tuple instead
of `ip2->ip3` which requires doing SNAT after the running filterPacketOutboundToWireGuard.
Updates tailscale/corp#19971, tailscale/corp#8020
Signed-off-by: Maisem Ali <maisem@tailscale.com>
It was documented as such but seems to have been dropped in a
refactor, restore the behavior. This brings down the time it
takes to run a single integration test by 2s which adds up
quite a bit.
Updates tailscale/corp#19786
Signed-off-by: Maisem Ali <maisem@tailscale.com>
This adds a new bool that can be sent down from control
to do jailing on the client side. Previously this would
only be done from control by modifying the packet filter
we sent down to clients. This would result in a lot of
additional work/CPU on control, we could instead just
do this on the client. This has always been a TODO which
we keep putting off, might as well do it now.
Updates tailscale/corp#19623
Signed-off-by: Maisem Ali <maisem@tailscale.com>
This plumbs a packet filter for jailed nodes through to the
tstun.Wrapper; the filter for a jailed node is equivalent to a "shields
up" filter. Currently a no-op as there is no way for control to
tell the client whether a peer is jailed.
Updates tailscale/corp#19623
Co-authored-by: Andrew Dunham <andrew@du.nham.ca>
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Change-Id: I5ccc5f00e197fde15dd567485b2a99d8254391ad
Now that tsdial.Dialer.UserDial has been updated to honor the configured routes
and dial external network addresses without going through Tailscale, while also being
able to dial a node/subnet router on the tailnet, we can start using UserDial to forward
DNS requests. This is primarily needed for DNS over TCP when forwarding requests
to internal DNS servers, but we also update getKnownDoHClientForProvider to use it.
Updates tailscale/corp#18725
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This refactors the peerConfig struct to allow storing more
details about a peer and not just the masq addresses. To be
used in a follow up change.
As a side effect, this also makes the DNAT logic on the inbound
packet stricter. Previously it would only match against the packets
dst IP, not it also takes the src IP into consideration. The beahvior
is at parity with the SNAT case.
Updates tailscale/corp#19623
Co-authored-by: Andrew Dunham <andrew@du.nham.ca>
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Change-Id: I5f40802bebbf0f055436eb8824e4511d0052772d
We'd like to use tsdial.Dialer.UserDial instead of SystemDial for DNS over TCP.
This is primarily necessary to properly dial internal DNS servers accessible
over Tailscale and subnet routes. However, to avoid issues when switching
between Wi-Fi and cellular, we need to ensure that we don't retain connections
to any external addresses on the old interface. Therefore, we need to determine
which dialer to use internally based on the configured routes.
This plumbs routes and localRoutes from router.Config to tsdial.Dialer,
and updates UserDial to use either the peer dialer or the system dialer,
depending on the network address and the configured routes.
Updates tailscale/corp#18725
Fixes#4529
Signed-off-by: Nick Khyl <nickk@tailscale.com>
While debugging a failing test in airplane mode on macOS, I noticed
netcheck logspam about ICMP socket creation permission denied errors.
Apparently macOS just can't do those, or at least not in airplane
mode. Not worth spamming about.
Updates #cleanup
Change-Id: I302620cfd3c8eabb25202d7eef040c01bd8a843c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
To aid in debugging exactly what's going wrong, instead of the
not-particularly-useful "dns udp query: context deadline exceeded" error
that we currently get.
Updates #3786
Updates #10768
Updates #11620
(etc.)
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I76334bf0681a8a2c72c90700f636c4174931432c
So that we can use this for additional, non-NAT configuration without it
being confusing.
Updates #cleanup
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I1658d59c9824217917a94ee76d2d08f0a682986f
This was a holdover from the older, pre-BART days and is no longer
necessary.
Updates #cleanup
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I71b892bab1898077767b9ff51cef33d59c08faf8
Updates tailscale/corp#18960
Tests in corp called us using the wrong logging calls. Removed.
This is logged downstream anyway.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>