Thanks to @nshalman and @Soypete for debugging!
Updates #6054
Change-Id: I74550cc31f8a257b37351b8152634c768e1e0a8a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
By default, `http.Transport` keeps idle connections open hoping to re-use them in the future. Combined with a separate transport per request in HTTP proxy this results in idle connection leak.
Fixes#6773
These aren't handled, but it's not an error to get one.
Fixes#6806
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I1fcb9032ac36420aa72a048bf26f58360b9461f9
"look up" is the verb. "lookup" is a noun.
Change-Id: I81c99e12c236488690758fb5c121e7e4e1622a36
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We saw a few cases where we hit this limit; bumping to 4k seems
relatively uncontroversial.
Change-Id: I218fee3bc0d2fa5fde16eddc36497a73ebd7cbda
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
We change our invocations of GetExtendedTcpTable to request additional
information about the "module" responsible for the port. In addition to pid,
this output also includes sufficient metadata to enable Windows to resolve
process names and disambiguate svchost processes.
We store the OS-specific output in an OSMetadata field in netstat.Entry, which
portlist may then use as necessary to actually resolve the process/module name.
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
It's long & distracting for how low value it is.
Fixes#6766
Change-Id: I51364f25c0088d9e63deb9f692ba44031f12251b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In some configurations, user explicitly do not want to store
tailscale state in k8s secrets, because doing that leads to
some annoying permission issues with sidecar containers.
With this change, TS_KUBE_SECRET="" and TS_STATE_DIR=/foo
will force storage to file when running in kubernetes.
Fixes#6704.
Signed-off-by: David Anderson <danderson@tailscale.com>
The Tailscale logging service has a hard limit on the maximum
log message size that can be accepted.
We want to ensure that netlog messages never exceed
this limit otherwise a client cannot transmit logs.
Move the goroutine for periodically dumping netlog messages
from wgengine/netlog to net/connstats.
This allows net/connstats to manage when it dumps messages,
either based on time or by size.
Updates tailscale/corp#8427
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
As backup plan, just in case the earlier fix's logic wasn't correct
and we want to experiment in the field or have users have a quicker
fix.
Updates #5285
Change-Id: I7447466374d11f8f609de6dfbc4d9a944770826d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This ensures that we capture error returned by `Serve` and exit with a
non-zero exit code if it happens.
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
The operator creates a fair bit of internal cluster state to manage proxying,
dumping it all in the default namespace is handy for development but rude
for production.
Updates #502
Signed-off-by: David Anderson <danderson@tailscale.com>
Consider the following pattern:
err1 := foo()
err2 := bar()
err3 := baz()
return multierr.New(err1, err2, err3)
If err1, err2, and err3 are all nil, then multierr.New should not allocate.
Thus, modify the logic of New to count the number of distinct error values
and allocate the exactly needed slice. This also speeds up non-empty error
situation since repeatedly growing with append is slow.
Performance:
name old time/op new time/op delta
Empty-24 41.8ns ± 2% 6.4ns ± 1% -84.73% (p=0.000 n=10+10)
NonEmpty-24 120ns ± 3% 69ns ± 1% -42.01% (p=0.000 n=9+10)
name old alloc/op new alloc/op delta
Empty-24 64.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10)
NonEmpty-24 168B ± 0% 88B ± 0% -47.62% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Empty-24 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
NonEmpty-24 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10)
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
We used to need to do timed requeues in a few places in the reconcile logic,
and the easiest way to do that was to plumb reconcile.Result return values
around. But now we're purely event-driven, so the only thing we care about
is whether or not an error occurred.
Incidentally also fix a very minor bug where headless services would get
completely ignored, rather than reconciled into the correct state. This
shouldn't matter in practice because you can't transition from a headful
to a headless service without a deletion, but for consistency let's avoid
having a path that takes no definite action if a service of interest does
exist.
Updates #502.
Signed-off-by: David Anderson <danderson@tailscale.com>
Previously, we had to do blind timed requeues while waiting for
the tailscale hostname, because we looked up the hostname through
the API. But now the proxy container image writes back its hostname
to the k8s secret, so we get an event-triggered reconcile automatically
when the time is right.
Updates #502
Signed-off-by: David Anderson <danderson@tailscale.com>
As is convention in the k8s world, use zap for structured logging. For
development, OPERATOR_LOGGING=dev switches to a more human-readable output
than JSON.
Updates #502
Signed-off-by: David Anderson <danderson@tailscale.com>
Our reconcile loop gets triggered again when the StatefulSet object
finally disappears (in addition to when its deletion starts, as indicated
by DeletionTimestamp != 0). So, we don't need to queue additional
reconciliations to proceed with the remainder of the cleanup, that
happens organically.
Signed-off-by: David Anderson <danderson@tailscale.com>
Previously, if a DNS-over-TCP message was received while there were
existing queries in-flight, and it was over the size limit, we'd close
the 'responses' channel. This would cause those in-flight queries to
send on the closed channel and panic.
Instead, don't close the channel at all and rely on s.ctx being
canceled, which will ensure that in-flight queries don't hang.
Fixes#6725
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I8267728ac37ed7ae38ddd09ce2633a5824320097
It's possible for the 'somethingChanged' callback to be registered and
then trigger before the ctx field is assigned; move the assignment
earlier so this can't happen.
Change-Id: Ia7ee8b937299014a083ab40adf31a8b3e0db4ec5
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>