I noticed while debugging a test failure elsewhere that our failure
logs (when verbosity is cranked up) were uselessly attributing dial
failures to failure to dial an invalid IP address (this IPv6 address
we didn't have), rather than showing me the actual IPv4 connection
failure.
Updates #13597 (tangentially)
Change-Id: I45ffbefbc7e25ebfb15768006413a705b941dae5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
As an alterative to #11935 using #12003.
Updates #11935
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I05f643fe812ceeaec5f266e78e3e529cab3a1ac3
The goal is to move more network state accessors to netmon.Monitor
where they can be cheaper/cached. But first (this change and others)
we need to make sure the one netmon.Monitor is plumbed everywhere.
Some notable bits:
* tsdial.NewDialer is added, taking a now-required netmon
* because a tsdial.Dialer always has a netmon, anything taking both
a Dialer and a NetMon is now redundant; take only the Dialer and
get the NetMon from that if/when needed.
* netmon.NewStatic is added, primarily for tests
Updates tailscale/corp#10910
Updates tailscale/corp#18960
Updates #7967
Updates #3299
Change-Id: I877f9cb87618c4eb037cee098241d18da9c01691
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The other IP types don't appear to be imported anymore, and after a scan
through I couldn't see any substantial usage of other representations,
so I think this TODO is complete.
Updates #cleanup
Signed-off-by: James Tucker <james@tailscale.com>
This change introduces tstime.NewClock and tstime.ClockOpts as a new way
to construct tstime.Clock. This is a subset of #8464 as a stepping stone
so that we can update our internal code to use the new API before making
the second round of changes.
Updates #8463
Change-Id: Ib26edb60e5355802aeca83ed60e4fdf806c90e27
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
On some platforms (notably macOS and iOS) we look up the default
interface to bind outgoing connections to. This is both duplicated
work and results in logspam when the default interface is not available
(i.e. when a phone has no connectivity, we log an error and thus cause
more things that we will try to upload and fail).
Fixed by passing around a netmon.Monitor to more places, so that we can
use its cached interface state.
Fixes#7850
Updates #7621
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
We have many function pointers that we replace for the duration of test and
restore it on test completion, add method to do that.
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Now that we're using rand.Shuffle in a few locations, create a generic
shuffle function and use it instead. While we're at it, move the
interleaveSlices function to the same package for use.
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I0b00920e5b3eea846b6cedc30bd34d978a049fd3
This updates all source files to use a new standard header for copyright
and license declaration. Notably, copyright no longer includes a date,
and we now use the standard SPDX-License-Identifier header.
This commit was done almost entirely mechanically with perl, and then
some minimal manual fixes.
Updates #6865
Signed-off-by: Will Norris <will@tailscale.com>
This allows users to temporarily enable/disable dnscache logging via a
new node capability, to aid in debugging strange connectivity issues.
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I46cf2596a8ae4c1913880a78d0033f8b668edc08
This is temporary while we work to upstream performance work in
https://github.com/WireGuard/wireguard-go/pull/64. A replace directive
is less ideal as it breaks dependent code without duplication of the
directive.
Signed-off-by: Jordan Whited <jordan@tailscale.com>
On Android, the system resolver can return IPv4 addresses as IPv6-mapped
addresses (i.e. `::ffff:a.b.c.d`). After the switch to `net/netip`
(19008a3), this case is no longer handled and a response like this will
be seen as failure to resolve any IPv4 addresses.
Handle this case by simply calling `Unmap()` on the returned IPs. Fixes#5698.
Signed-off-by: Peter Cai <peter@typeblog.net>
And remove the GCP special-casing from ipn/ipnlocal; do it only in the
forwarder for *.internal.
Fixes#4980Fixes#4981
Change-Id: I5c481e96d91f3d51d274a80fbd37c38f16dfa5cb
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This does three things:
* If you're on GCP, it adds a *.internal DNS split route to the
metadata server, so we never break GCP DNS names. This lets people
have some Tailscale nodes on GCP and some not (e.g. laptops at home)
without having to add a Tailnet-wide *.internal DNS route.
If you already have such a route, though, it won't overwrite it.
* If the 100.100.100.100 DNS forwarder has nowhere to forward to,
it forwards it to the GCP metadata IP, which forwards to 8.8.8.8.
This means there are never errNoUpstreams ("upstream nameservers not set")
errors on GCP due to e.g. mangled /etc/resolv.conf (GCP default VMs
don't have systemd-resolved, so it's likely a DNS supremacy fight)
* makes the DNS fallback mechanism use the GCP metadata IP as a
fallback before our hosted HTTP-based fallbacks
I created a default GCP VM from their web wizard. It has no
systemd-resolved.
I then made its /etc/resolv.conf be empty and deleted its GCP
hostnames in /etc/hosts.
I then logged in to a tailnet with no global DNS settings.
With this, tailscaled writes /etc/resolv.conf (direct mode, as no
systemd-resolved) and sets it to 100.100.100.100, which then has
regular DNS via the metadata IP and *.internal DNS via the metadata IP
as well. If the tailnet configures explicit DNS servers, those are used
instead, except for *.internal.
This also adds a new util/cloudenv package based on version/distro
where the cloud type is only detected once. We'll likely expand it in
the future for other clouds, doing variants of this change for other
popular cloud environments.
Fixes#4911
RELNOTES=Google Cloud DNS improvements
Change-Id: I19f3c2075983669b2b2c0f29a548da8de373c7cf
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Simplify the ability to reason about the DoH dialing code by reusing the
dnscache's dialer we already have.
Also, reduce the scope of the "ip" variable we don't want to close over.
This necessarily adds a new field to dnscache.Resolver:
SingleHostStaticResult, for when the caller already knows the IPs to be
returned.
Change-Id: I9f2aef7926f649137a5a3e63eebad6a3fffa48c0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When the context is canceled, dc.dialOne returns an error from line 345.
This causes the defer on line 312 to try to resolve the host again, which
triggers a dns lookup of "127.0.0.1" from derp.
Updates tailscale/corp#4475
Signed-off-by: Maisem Ali <maisem@tailscale.com>
If we've already connected to a certain name's IP in the past, don't
assume the problem was DNS related. That just puts unnecessarily load
on our bootstrap DNS servers during regular restarts of Tailscale
infrastructure components.
Also, if we do do a bootstrap DNS lookup and it gives the same IP(s)
that we already tried, don't try them again.
Change-Id: I743e8991a7f957381b8e4c1508b8e9d0df1782fe
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
No behavior changes (intended, at least).
This is in prep for future changes to this package, which would get
too complicated in the current style.
Change-Id: Ic260f8e34ae2f64f34819d4a56e38bee8d8ac5ce
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
A new package can also later record/report which knobs are checked and
set. It also makes the code cleaner & easier to grep for env knobs.
Change-Id: Id8a123ab7539f1fadbd27e0cbeac79c2e4f09751
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Go 1.17 added a HandshakeContext func to take care of timeouts during
TLS handshaking, so switch from our homegrown goroutine implementation
to the standard way.
Signed-off-by: David Anderson <danderson@tailscale.com>
Cancelling the context makes the timeout goroutine race with the write that
reports a successful TLS handshake, so you can end up with a successful TLS
handshake that mysteriously reports that it timed out after ~0s in flight.
The context is always canceled and cleaned up as the function exits, which
happens mere microseconds later, so just let function exit clean up and
thereby avoid races.
Signed-off-by: David Anderson <danderson@tailscale.com>
This is enough to handle the DNS queries as generated by Go's
net package (which our HTTP/SOCKS client uses), and the responses
generated by the ExitDNS DoH server.
This isn't yet suitable for putting on 100.100.100.100 where a number
of different DNS clients would hit it, as this doesn't yet do
EDNS0. It might work, but it's untested and likely incomplete.
Likewise, this doesn't handle anything about truncation, as the
exchanges are entirely in memory between Go or DoH. That would also
need to be handled later, if/when it's hooked up to 100.100.100.100.
Updates #3507
Change-Id: I1736b0ad31eea85ea853b310c52c5e6bf65c6e2a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Tested manually with:
$ go test -v ./net/dnscache/ -dial-test=bogusplane.dev.tailscale.com:80
Where bogusplane has three A records, only one of which works.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Cache DNS results of earlier login.tailscale.com control dials, and use
them for future dials if DNS is slow or broken.
Fixes various issues with trickier setups with the domain's DNS server
behind a subnet router.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>