57 Commits

Author SHA1 Message Date
Anton Tolchanov
c2af1cd9e3 prober: support multiple probes running concurrently
Some probes might need to run for longer than their scheduling interval,
so this change relaxes the 1-at-a-time restriction, allowing us to
configure probe concurrency and timeout separately. The default values
remain the same (concurrency of 1; timeout of 80% of interval).

Updates tailscale/corp#25479

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2025-01-30 12:22:23 +00:00
Percy Wegmann
2729942638 prober: fix nil pointer access in tcp-in-tcp probes
If unable to accept a connection from the bandwidth probe listener,
return from the goroutine immediately since the accepted connection
will be nil.

Updates tailscale/corp#25958

Signed-off-by: Percy Wegmann <percy@tailscale.com>
2025-01-21 12:44:56 -06:00
Jordan Whited
00bd906797
prober: remove DERP pub key copying overheads in qd and non-tun measures (#14659)
Updates tailscale/corp#25883

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2025-01-15 16:28:49 -08:00
Jordan Whited
84b0379dd5
prober: remove per-packet DERP pub key copying overheads (#14658)
Updates tailscale/corp#25883

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2025-01-15 15:47:26 -08:00
Percy Wegmann
6ccde369ff prober: record total bytes transferred in DERP bandwidth probes
This will enable Prometheus queries to look at the bandwidth over time windows,
for example 'increase(derp_bw_bytes_total)[1h] / increase(derp_bw_transfer_time_seconds_total)[1h]'.

Fixes commit a51672cafd8b6c4e87915a55bda1491eb7cbee84.

Updates tailscale/corp#25503

Signed-off-by: Percy Wegmann <percy@tailscale.com>
2025-01-13 12:41:30 -06:00
Percy Wegmann
cd795d8a7f prober: support filtering regions by region ID in addition to code
Updates tailscale/corp#25758

Signed-off-by: Percy Wegmann <percy@tailscale.com>
2025-01-10 12:33:19 -06:00
Percy Wegmann
a51672cafd prober: record total bytes transferred in DERP bandwidth probes
This will enable Prometheus queries to look at the bandwidth over time windows,
for example 'increase(derp_bw_bytes_total)[1h] / increase(derp_bw_transfer_time_seconds_total)[1h]'.

Updates tailscale/corp#25503

Signed-off-by: Percy Wegmann <percy@tailscale.com>
2025-01-09 09:22:44 -06:00
Percy Wegmann
c81a95dd53 prober: clone histogram buckets before handing to Prometheus for derp_qd_probe_delays_seconds
Updates tailscale/corp#25697

Signed-off-by: Percy Wegmann <percy@tailscale.com>
2025-01-08 12:02:12 -06:00
Percy Wegmann
5095efd628 prober: make histogram buckets cumulative
Histogram buckets should include counts for all values under the bucket ceiling,
not just those between the ceiling and the next lower ceiling.

See https://prometheus.io/docs/tutorials/understanding_metric_types/\#histogram

Updates tailscale/corp#24522

Signed-off-by: Percy Wegmann <percy@tailscale.com>
2024-12-20 10:28:37 -06:00
Percy Wegmann
00a4504cf1 cmd/derpprobe,prober: add ability to perform continuous queuing delay measurements against DERP servers
This new type of probe sends DERP packets sized similarly to CallMeMaybe packets
at a rate of 10 packets per second. It records the round-trip times in a Prometheus
histogram. It also keeps track of how many packets are dropped. Packets that fail to
arrive within 5 seconds are considered dropped.

Updates tailscale/corp#24522

Signed-off-by: Percy Wegmann <percy@tailscale.com>
2024-12-19 10:45:56 -06:00
Brad Fitzpatrick
2506b81471 prober: fix WithBandwidthProbing behavior with optional tunAddress
1ed9bd76d682299376f404521cf1958a7f9bea7a meant to make tunAddress be optional.

Updates tailscale/corp#24635

Change-Id: Idc4a8540b294e480df5bd291967024c04df751c0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-12-16 12:18:54 -08:00
Percy Wegmann
1ed9bd76d6 prober: perform DERP bandwidth probes over TUN device to mimic real client
Updates tailscale/corp#24635

Co-authored-by: Mario Minardi <mario@tailscale.com>
Signed-off-by: Percy Wegmann <percy@tailscale.com>
2024-12-13 15:50:47 -06:00
Mario Minardi
ea3d0bcfd4
prober,derp/derphttp: make dev-mode DERP probes work without TLS (#14347)
Make dev-mode DERP probes work without TLS. Properly dial port `3340`
when not using HTTPS when dialing nodes in `derphttp_client`. Skip
verifying TLS state in `newConn` if we are not running a prober.

Updates tailscale/corp#24635

Signed-off-by: Percy Wegmann <percy@tailscale.com>
Co-authored-by: Percy Wegmann <percy@tailscale.com>
2024-12-10 10:51:03 -07:00
Percy Wegmann
1355f622be cmd/derpprobe,prober: add ability to restrict derpprobe to a single region
Updates #24522

Co-authored-by: Mario Minardi <mario@tailscale.com>
Signed-off-by: Percy Wegmann <percy@tailscale.com>
2024-11-15 13:42:58 -06:00
Anton Tolchanov
46db698333 prober: make status page more clear
Updates tailscale/corp#20583

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2024-08-08 17:34:29 +01:00
Anton Tolchanov
9106187a95 prober: support JSON response in RunHandler
Updates tailscale/corp#20583

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2024-08-06 11:27:59 +01:00
Anton Tolchanov
9b08399d9e prober: add a status page handler
This change adds an HTTP handler with a table showing a list of all
probes, their status, and a button that allows triggering a specific
probe.

Updates tailscale/corp#20583

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2024-08-06 11:27:59 +01:00
Anton Tolchanov
153a476957 prober: add an HTTP endpoint for triggering a probe
- Keep track of the last 10 probe results and successful probe
  latencies;
- Add an HTTP handler that triggers a given probe by name and returns it
  result as a plaintext HTML page, showing recent probe results as a
  baseline

Updates tailscale/corp#20583

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2024-08-06 11:27:59 +01:00
Anton Tolchanov
c8f258a904 prober: propagate DERPMap request creation errors
Updates tailscale/corp#8497

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2024-07-09 13:43:51 +01:00
Brad Fitzpatrick
8a11a43c28 cmd/derpprobe: support 'local' derpmap to get derp map via LocalAPI
To make it easier for people to monitor their custom DERP fleet.

Updates tailscale/corp#20654

Change-Id: Id8af22936a6d893cc7b6186d298ab794a2672524
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-06-06 13:14:27 -07:00
Brad Fitzpatrick
6877d44965 prober: plumb a now-required netmon to derphttp
Updates #11896

Change-Id: Ie2f9cd024d85b51087d297aa36c14a9b8a2b8129
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-05-15 10:35:26 -04:00
Brad Fitzpatrick
3672f29a4e net/netns, net/dns/resolver, etc: make netmon required in most places
The goal is to move more network state accessors to netmon.Monitor
where they can be cheaper/cached. But first (this change and others)
we need to make sure the one netmon.Monitor is plumbed everywhere.

Some notable bits:

* tsdial.NewDialer is added, taking a now-required netmon

* because a tsdial.Dialer always has a netmon, anything taking both
  a Dialer and a NetMon is now redundant; take only the Dialer and
  get the NetMon from that if/when needed.

* netmon.NewStatic is added, primarily for tests

Updates tailscale/corp#10910
Updates tailscale/corp#18960
Updates #7967
Updates #3299

Change-Id: I877f9cb87618c4eb037cee098241d18da9c01691
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-04-27 12:17:45 -07:00
Brad Fitzpatrick
7c1d6e35a5 all: use Go 1.22 range-over-int
Updates #11058

Change-Id: I35e7ef9b90e83cac04ca93fd964ad00ed5b48430
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-04-16 15:32:38 -07:00
Anton Tolchanov
5336362e64 prober: export probe class and metrics from bandwidth prober
- Wrap each prober function into a probe class that allows associating
  metric labels and custom metrics with a given probe;
- Make sure all existing probe classes set a `class` metric label;
- Move bandwidth probe size from being a metric label to a separate
  gauge metric; this will make it possible to use it to calculate
  average used bandwidth using a PromQL query;
- Also export transfer time for the bandwidth prober (more accurate than
  the total probe time, since it excludes connection establishment
  time).

Updates tailscale/corp#17912

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2024-04-08 12:02:58 +01:00
Anton Tolchanov
21671ca374 prober: remove unused notification code
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2024-04-08 12:02:58 +01:00
Andrew Dunham
7d7d159824 prober: support creating multiple probes in ForEachAddr
So that we can e.g. check TLS on multiple ports for a given IP.

Updates tailscale/corp#16367

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I81d840a4c88138de1cbb2032b917741c009470e6
2024-04-04 13:04:16 -04:00
Andrew Dunham
ac574d875c prober: add helper function to check all IPs for a DNS hostname
This allows us to check all IP addresses (and address families) for a
given DNS hostname while dynamically discovering new IPs and removing
old ones as they're no longer valid.

Also add a testable example that demonstrates how to use it.

Alternative to #11610
Updates tailscale/corp#16367

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I6d6f39bafc30e6dfcf6708185d09faee2a374599
2024-04-04 11:11:33 -04:00
Anton Tolchanov
f12d2557f9 prober: add a DERP bandwidth probe
Updates tailscale/corp#17912

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2024-03-13 13:36:45 +00:00
Anton Tolchanov
5018683d58 prober: remove unused derp prober latency measurements
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2024-03-13 13:36:45 +00:00
Anton Tolchanov
205a10b51a prober: export probe counters and cumulative latency
Updates #cleanup

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2024-03-13 13:36:45 +00:00
Brad Fitzpatrick
a4a909a20b prober: add TLS probe constructor to split dial addr from cert name
So we can probe load balancers by their unique DNS name but without
asking for that cert name.

Updates tailscale/corp#13050

Change-Id: Ie4c0a2f951328df64281ed1602b4e624e3c8cf2e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-02-19 09:03:13 -08:00
Anton Tolchanov
869b34ddeb prober: log HTTP response body on failure
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2023-12-13 14:30:16 +00:00
Thomas Kosiewski
96a80fcce3 Add support for custom DERP port in TLS prober
Updates #10146

Signed-off-by: Thomas Kosiewski <thoma471@googlemail.com>
2023-11-07 12:56:13 +00:00
valscale
f314fa4a4a
prober: fix data race when altering derpmap (#8397)
Move the clearing of STUNOnly flag to the updateMap() function.

Fixes #8395

Signed-off-by: Val <valerie@tailscale.com>
2023-06-21 10:16:31 -07:00
valscale
88097b836a
prober: allow monitoring of nodes marked as STUN only in default derpmap (#8391)
prober uses NewRegionClient() to connect to a derper using a faked up
single-node region, but NewRegionClient() fails to connect if there is
no non-STUN only client in the region. Set the STUN only flag to false
before we call NewRegionClient() so we can monitor nodes marked as
STUN only in the default derpmap.

Updates #11492

Signed-off-by: Val <valerie@tailscale.com>
2023-06-20 12:04:55 -07:00
Mihai Parparita
7330aa593e all: avoid repeated default interface lookups
On some platforms (notably macOS and iOS) we look up the default
interface to bind outgoing connections to. This is both duplicated
work and results in logspam when the default interface is not available
(i.e. when a phone has no connectivity, we log an error and thus cause
more things that we will try to upload and fail).

Fixed by passing around a netmon.Monitor to more places, so that we can
use its cached interface state.

Fixes #7850
Updates #7621

Signed-off-by: Mihai Parparita <mihai@tailscale.com>
2023-04-20 15:46:01 -07:00
Andrew Dunham
280255acae
various: add golangci-lint, fix issues (#7905)
This adds an initial and intentionally minimal configuration for
golang-ci, fixes the issues reported, and adds a GitHub Action to check
new pull requests against this linter configuration.

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I8f38fbc315836a19a094d0d3e986758b9313f163
2023-04-17 18:38:24 -04:00
Anton Tolchanov
c153e6ae2f prober: migrate to Prometheus metric library
This provides an example of using native Prometheus metrics with tsweb.

Prober library seems to be the only user of PrometheusVar, so I am
removing support for it in tsweb.

Updates https://github.com/tailscale/corp/issues/10205

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2023-04-11 08:37:32 +01:00
Anton Tolchanov
7083246409 prober: only record latency for successful probes
This will make it easier to track probe latency on a dashboard.

Updates https://github.com/tailscale/corp/issues/9916

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2023-03-22 09:24:20 +00:00
Anton Tolchanov
e59dc29a55 prober: log client pubkeys on derp mesh probe failures
Updates https://github.com/tailscale/corp/issues/9916

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2023-03-20 17:41:37 +00:00
Anton Tolchanov
100d8e909e cmd/derpprobe: migrate to the prober framework
`prober.DERP` was created in #5988 based on derpprobe. Having used it
instead of derpprobe for a few months, I think we have enough confidence
that it works and can now migrate derpprobe to use the prober framework
and get rid of code duplication.

A few notable changes in behaviour:
- results of STUN probes over IPv4 and IPv6 are now reported separately;
- TLS probing now includes OCSP verification;
- probe names in the output have changed;
- ability to send Slack notification from the prober has been removed.
  Instead, the prober now exports metrics in Expvar (/debug/vars) and
  Prometheus (/debug/varz) formats.

Fixes https://github.com/tailscale/corp/issues/8497

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2023-01-31 10:47:42 +00:00
Will Norris
71029cea2d all: update copyright and license headers
This updates all source files to use a new standard header for copyright
and license declaration.  Notably, copyright no longer includes a date,
and we now use the standard SPDX-License-Identifier header.

This commit was done almost entirely mechanically with perl, and then
some minimal manual fixes.

Updates #6865

Signed-off-by: Will Norris <will@tailscale.com>
2023-01-27 15:36:29 -08:00
Brad Fitzpatrick
a1b4ab34e6 util/httpm: add new package for prettier HTTP method constants
See package doc.

Change-Id: Ibbfc8e1f98294217c56f3a9452bd93ffa3103572
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2023-01-26 19:44:07 -08:00
Andrew Dunham
06b55ab50f prober: fix test flake
This was tested by running 10000 test iterations and observing no flakes
after this change was made.

Change-Id: Ib036fd03a3a17800132c53c838cc32bfe2961306
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
2022-11-02 09:58:40 -04:00
Anton Tolchanov
bd47e28638 prober: optionally spread probes over time
By default all probes with the same probe interval that have been added
together will run on a synchronized schedule, which results in spiky
resource usage and potential throttling by third-party systems (for
example, OCSP servers used by the TLS probes).

To address this, prober can now run in "spread" mode that will
introduce a random delay before the first run of each probe.

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2022-10-21 09:41:53 +01:00
Anton Tolchanov
69f61dcad8 prober: add a DERP probe manager based on derpprobe
This ensures that each DERP server is probed individually (TLS and STUN)
and also manages per-region mesh probing. Actual probing code has been
copied from cmd/derpprobe.

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2022-10-20 13:54:34 +01:00
Denton Gentry
b55761246b prober: add utilities to generate alerts and warnings.
sendAlert will trigger the Incident Response system.
sendWarning will post to Slack.

Co-authored-by: M. J. Fromberger <fromberger@tailscale.com>
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
2022-10-16 23:34:04 -07:00
Anton Tolchanov
26af329fde prober: expand certificate verification logic in the TLS prober
TLS prober now checks validity period for all server certificates
and verifies OCSP revocation status for the leaf cert.

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2022-10-14 15:00:38 +01:00
Josh Soref
d4811f11a0 all: fix spelling mistakes
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
2022-09-29 13:36:13 -07:00
Brad Fitzpatrick
4950fe60bd syncs, all: move to using Go's new atomic types instead of ours
Fixes #5185

Change-Id: I850dd532559af78c3895e2924f8237ccc328449d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2022-08-04 07:47:59 -07:00