updates tailscale/corp#21823
Misconfigured, broken, or blocked DNS will often present as
"internet is broken'" to the end user. This plumbs the health tracker
into the dns manager and forwarder and adds a health warning
with a 5 second delay that is raised on failures in the forwarder and
lowered on successes.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
And some misc doc tweaks for idiomatic Go style.
Updates #cleanup
Change-Id: I3ca45f78aaca037f433538b847fd6a9571a2d918
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Fixestailscale/corp#20677
Replaces the original attempt to rectify this (by injecting a netMon
event) which was both heavy handed, and missed cases where the
netMon event was "minor".
On apple platforms, the fetching the interface's nameservers can
and does return an empty list in certain situations. Apple's API
in particular is very limiting here. The header hints at notifications
for dns changes which would let us react ahead of time, but it's all
private APIs.
To avoid remaining in the state where we end up with no
nameservers but we absolutely need them, we'll react
to a lack of upstream nameservers by attempting to re-query
the OS.
We'll rate limit this to space out the attempts. It seems relatively
harmless to attempt a reconfig every 5 seconds (triggered
by an incoming query) if the network is in this broken state.
Missing nameservers might possibly be a persistent condition
(vs a transient error), but that would also imply that something
out of our control is badly misconfigured.
Tested by randomly returning [] for the nameservers. When switching
between Wifi networks, or cell->wifi, this will randomly trigger
the bug, and we appear to reliably heal the DNS state.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
Without this rule, Windows 8.1 and newer devices issue parallel DNS requests to DNS servers
associated with all network adapters, even when "Override local DNS" is enabled and/or
a Mullvad exit node is being used, resulting in DNS leaks.
This also adds "disable-local-dns-override-via-nrpt" nodeAttr that can be used to disable
the new behavior if needed.
Fixestailscale/corp#20718
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Updates corp#15802.
Adds the ability for control to disable the recently added change that uses split DNS in more cases on iOS. This will allow us to disable the feature if it leads to regression in production. We plan to remove this knob once we've verified that the feature works properly.
Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
Updates https://github.com/tailscale/corp/issues/15802.
On iOS exclusively, this PR adds logic to use a split DNS configuration in more cases, with the goal of improving battery life. Acting as the global DNS resolver on iOS should be avoided, as it leads to frequent wakes of IPNExtension.
We try to determine if we can have Tailscale only handle DNS queries for resources inside the tailnet, that is, all routes in the DNS configuration do not require a custom resolver (this is the case for app connectors, for instance).
If so, we set all Routes as MatchDomains. This enables a split DNS configuration which will help preserve battery life. Effectively, for the average Tailscale user who only relies on MagicDNS to resolve *.ts.net domains, this means that Tailscale DNS will only be used for those domains.
This PR doesn't affect users with Override Local DNS enabled. For these users, there should be no difference and Tailscale will continue acting as a global DNS resolver.
Signed-off-by: Andrea Gottardo <andrea@tailscale.com>
The goal is to move more network state accessors to netmon.Monitor
where they can be cheaper/cached. But first (this change and others)
we need to make sure the one netmon.Monitor is plumbed everywhere.
Some notable bits:
* tsdial.NewDialer is added, taking a now-required netmon
* because a tsdial.Dialer always has a netmon, anything taking both
a Dialer and a NetMon is now redundant; take only the Dialer and
get the NetMon from that if/when needed.
* netmon.NewStatic is added, primarily for tests
Updates tailscale/corp#10910
Updates tailscale/corp#18960
Updates #7967
Updates #3299
Change-Id: I877f9cb87618c4eb037cee098241d18da9c01691
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This moves most of the health package global variables to a new
`health.Tracker` type.
But then rather than plumbing the Tracker in tsd.System everywhere,
this only goes halfway and makes one new global Tracker
(`health.Global`) that all the existing callers now use.
A future change will eliminate that global.
Updates #11874
Updates #4136
Change-Id: I6ee27e0b2e35f68cb38fecdb3b2dc4c3f2e09d68
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This removes a potentially increased boot delay for certain boot
topologies where they block on ExecStartPre that may have socket
activation dependencies on other system services (such as
systemd-resolved and NetworkManager).
Also rename cleanup to clean up in affected/immediately nearby places
per code review commentary.
Fixes#11599
Signed-off-by: James Tucker <james@tailscale.com>
Config.singleResolverSet returns true if all routes have the same resolvers,
even if the routes have no resolvers. If none of the routes have a specific
resolver, the default should be used instead. Therefore, check for more than
0 instead of nil.
Signed-off-by: Ryan Petris <ryan@petris.net>
We weren't correctly retrying truncated requests to an upstream DNS
server with TCP. Instead, we'd return a truncated request to the user,
even if the user was querying us over TCP and thus able to handle a
large response.
Also, add an envknob and controlknob to allow users/us to disable this
behaviour if it turns out to be buggy (✨ DNS ✨).
Updates #9264
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ifb04b563839a9614c0ba03e9c564e8924c1a2bfd
On some platforms (notably macOS and iOS) we look up the default
interface to bind outgoing connections to. This is both duplicated
work and results in logspam when the default interface is not available
(i.e. when a phone has no connectivity, we log an error and thus cause
more things that we will try to upload and fail).
Fixed by passing around a netmon.Monitor to more places, so that we can
use its cached interface state.
Fixes#7850
Updates #7621
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
We're using it in more and more places, and it's not really specific to
our use of Wireguard (and does more just link/interface monitoring).
Also removes the separate interface we had for it in sockstats -- it's
a small enough package (we already pull in all of its dependencies
via other paths) that it's not worth the extra complexity.
Updates #7621
Updates #7850
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
This updates all source files to use a new standard header for copyright
and license declaration. Notably, copyright no longer includes a date,
and we now use the standard SPDX-License-Identifier header.
This commit was done almost entirely mechanically with perl, and then
some minimal manual fixes.
Updates #6865
Signed-off-by: Will Norris <will@tailscale.com>
With a42a594bb3, iOS uses netstack and
hence there are no longer any platforms which use the legacy MagicDNS path. As such, we remove it.
We also normalize the limit for max in-flight DNS queries on iOS (it was 64, now its 256 as per other platforms).
It was 64 for the sake of being cautious about memory, but now we have 50Mb (iOS-15 and greater) instead of 15Mb
so we have the spare headroom.
Signed-off-by: Tom DNetto <tom@tailscale.com>
We saw a few cases where we hit this limit; bumping to 4k seems
relatively uncontroversial.
Change-Id: I218fee3bc0d2fa5fde16eddc36497a73ebd7cbda
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Previously, if a DNS-over-TCP message was received while there were
existing queries in-flight, and it was over the size limit, we'd close
the 'responses' channel. This would cause those in-flight queries to
send on the closed channel and panic.
Instead, don't close the channel at all and rely on s.ctx being
canceled, which will ensure that in-flight queries don't hang.
Fixes#6725
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I8267728ac37ed7ae38ddd09ce2633a5824320097
Most visible when using tsnet.Server, but could have resulted in dropped
messages in a few other places too.
Fixes#5743
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
NextDNS is unique in that users create accounts and then get
user-specific DNS IPs & DoH URLs.
For DoH, the customer ID is in the URL path.
For IPv6, the IP address includes the customer ID in the lower bits.
For IPv4, there's a fragile "IP linking" mechanism to associate your
public IPv4 with an assigned NextDNS IPv4 and that tuple maps to your
customer ID.
We don't use the IP linking mechanism.
Instead, NextDNS is DoH-only. Which means using NextDNS necessarily
shunts all DNS traffic through 100.100.100.100 (programming the OS to
use 100.100.100.100 as the global resolver) because operating systems
can't usually do DoH themselves.
Once it's in Tailscale's DoH client, we then connect out to the known
NextDNS IPv4/IPv6 anycast addresses.
If the control plane sends the client a NextDNS IPv6 address, we then
map it to the corresponding NextDNS DoH with the same client ID, and
we dial that DoH server using the combination of v4/v6 anycast IPs.
Updates #2452
Change-Id: I3439d798d21d5fc9df5a2701839910f5bef85463
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
If ExtraRecords (Hosts) are specified without a corresponding split
DNS route and global DNS is specified, then program the host OS DNS to
use 100.100.100.100 so it can blend in those ExtraRecords.
Updates #1543
Change-Id: If49014a5ecc8e38978ff26e54d1f74fe8dbbb9bc
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This works around the 2.3s delay in short name lookups when SNR is
enabled.
C:\Windows\System32\drivers\etc\hosts file. We only add known hosts that
match the search domains, and we populate the list in order of
Search Domains so that our matching algorithm mimics what Windows would
otherwise do itself if SNR was off.
Updates #1659
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Currently we forward unmatched queries to the default resolver on
Windows. This results in duplicate queries being issued to the same
resolver which is just wasted.
Updates #1659
Signed-off-by: Maisem Ali <maisem@tailscale.com>
As discussed in previous PRs, we can register for notifications when group
policies are updated and act accordingly.
This patch changes nrptRuleDatabase to receive notifications that group policy
has changed and automatically move our NRPT rules between the local and
group policy subkeys as needed.
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
Fixes https://github.com/tailscale/corp/issues/5198
The upstream forwarder will block indefinitely on `udpconn.ReadFrom` if no
reply is recieved, due to the lack of deadline on the connection object.
There still isn't a deadline on the connection object, but the automatic closing
of the context on deadline expiry will close the connection via `closeOnCtxDone`,
unblocking the read and resulting in a normal teardown.
Signed-off-by: Tom DNetto <tom@tailscale.com>
* net/dns, wgengine: implement DNS over TCP
Signed-off-by: Tom DNetto <tom@tailscale.com>
* wgengine/netstack: intercept only relevant port/protocols to quad-100
Signed-off-by: Tom DNetto <tom@tailscale.com>
This were intended to be pushed to #4408, but in my excitement I
forgot to git push :/ better late than never.
Signed-off-by: Tom DNetto <tom@tailscale.com>
Moves magicDNS-specific handling out of Resolver & into dns.Manager. This
greatly simplifies the Resolver to solely issuing queries and returning
responses, without channels.
Enforcement of max number of in-flight magicDNS queries, assembly of
synthetic UDP datagrams, and integration with wgengine for
recieving/responding to magicDNS traffic is now entirely in Manager.
This path is being kept around, but ultimately aims to be deleted and
replaced with a netstack-based path.
This commit is part of a series to implement magicDNS using netstack.
Signed-off-by: Tom DNetto <tom@tailscale.com>
Two changes in one:
* make DoH upgrades an explicitly scheduled send earlier, when we come
up with the resolvers-and-delay send plan. Previously we were
getting e.g. four Google DNS IPs and then spreading them out in
time (for back when we only did UDP) but then later we added DoH
upgrading at the UDP packet layer, which resulted in sometimes
multiple DoH queries to the same provider running (each doing happy
eyeballs dialing to 4x IPs themselves) for each of the 4 source IPs.
Instead, take those 4 Google/Cloudflare IPs and schedule 5 things:
first the DoH query (which can use all 4 IPs), and then each of the
4 IPs as UDP later.
* clean up the dnstype.Resolver.Addr confusion; half the code was
using it as an IP string (as documented) as half was using it as
an IP:port (from some prior type we used), primarily for tests.
Instead, document it was being primarily an IP string but also
accepting an IP:port for tests, then add an accessor method on it
to get the IPPort and use that consistently everywhere.
Change-Id: Ifdd72b9e45433a5b9c029194d50db2b9f9217b53
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
* net/dns, net/dns/resolver, wgengine: refactor DNS request path
Previously, method calls into the DNS manager/resolver types handled DNS
requests rather than DNS packets. This is fine for UDP as one packet
corresponds to one request or response, however will not suit an
implementation that supports DNS over TCP.
To support PRs implementing this in the future, wgengine delegates
all handling/construction of packets to the magic DNS endpoint, to
the DNS types themselves. Handling IP packets at this level enables
future support for both UDP and TCP.
Signed-off-by: Tom DNetto <tom@tailscale.com>
Fixes#3660
RELNOTE=MagicDNS now works over IPv6 when CGNAT IPv4 is disabled.
Change-Id: I001e983df5feeb65289abe5012dedd177b841b45
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Not done yet, but this move more of the outbound dial special casing
from random packages into tsdial, which aspires to be the one unified
place for all outbound dialing shenanigans.
Then this plumbs it all around, so everybody is ultimately
holding on to the same dialer.
As of this commit, macOS/iOS using an exit node should be able to
reach to the exit node's DoH DNS proxy over peerapi, doing the sockopt
to stay within the Network Extension.
A number of steps remain, including but limited to:
* move a bunch more random dialing stuff
* make netstack-mode tailscaled be able to use exit node's DNS proxy,
teaching tsdial's resolver to use it when an exit node is in use.
Updates #1713
Change-Id: I1e8ee378f125421c2b816f47bc2c6d913ddcd2f5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Lets the systemd-resolved OSConfigurator report health changes
for out of band config resyncs.
Updates #3327
Signed-off-by: David Anderson <danderson@tailscale.com>
Windows has a public dns.Flush used in router_windows.go.
However that won't work for platforms like Linux, where
we need a different flush mechanism for resolved versus
other implementations.
We're instead adding a FlushCaches method to the dns Manager,
which can be made to work on all platforms as needed.
Fixes https://github.com/tailscale/tailscale/issues/2132
Signed-off-by: Denton Gentry <dgentry@tailscale.com>