54 Commits

Author SHA1 Message Date
Josh Bleecher Snyder
e45d51b060 logtail: add a few new methods to PublicID
These are for use in our internal systems.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2022-01-31 14:14:10 -08:00
Josh Bleecher Snyder
9fe5ece833 logtail: cap the buffer size in encodeText
This started as an attempt to placate GitHub's code scanner,
but it's also probably generally a good idea.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2022-01-13 14:37:27 -08:00
Brad Fitzpatrick
40e2b312b6 ipn/ipnserver, logpolicy: move Windows disk logging up earlier
This moves the Windows-only initialization of the filelogger into
logpolicy. Previously we only did it when babysitting the tailscaled
subprocess, but this meant that log messages from the service itself
never made it to disk. Examples that weren't logged to disk:

* logtail unable to dial out,
* DNS flush messages from the service
* svc.ChangeRequest messages (#3581)

This is basically the same fix as #3571 but staying in the Logf type,
and avoiding build-tagged file (which wasn't quite a goal, but
happened and seemed nice)

Fixes #3570

Co-authored-by: Aaron Klotz <aaron@tailscale.com>
Change-Id: Iacd80c4720b7218365ec80ae143339d030842702
2021-12-16 12:33:04 -08:00
Brad Fitzpatrick
3b541c833e util/clientmetric, logtail: log metric changes
Updates #3307

Change-Id: I1399ebd786f6ff7defe6e11c0eb651144c071574
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-11-16 08:06:31 -08:00
Maisem Ali
05e55f4a0b logtail/filch: limit buffer file size to 50MB
Signed-off-by: Maisem Ali <maisem@tailscale.com>
2021-10-29 13:31:30 -07:00
Josh Bleecher Snyder
94fb42d4b2 all: use testingutil.MinAllocsPerRun
There are a few remaining uses of testing.AllocsPerRun:
Two in which we only log the number of allocations,
and one in which dynamically calculate the allocations
target based on a different AllocsPerRun run.

This also allows us to tighten the "no allocs"
test in wgengine/filter.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-10-28 12:48:37 -07:00
Josh Bleecher Snyder
0c038b477f logtail: add a re-usable buffer for uploads
This avoids a per-upload alloc (which in practice
often means per-log-line), up to 4k.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-08-17 12:56:57 -07:00
Josh Bleecher Snyder
278e7de9c9 logtail: always send a json array
The code goes to some effort to send a single JSON object
when there's only a single line and a JSON array when there
are multiple lines.

It makes the code more complex and more expensive;
when we add a second line, we have to use a second buffer
to duplicate the first one after adding a leading square brackets.

The savings come to two bytes. Instead, always send an array.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-08-17 12:56:57 -07:00
Josh Bleecher Snyder
93284209bc logtail/filch: preallocate a scanner buffer
Scanning log lines is a frequent source of allocations.
Pre-allocate a re-usable buffer.

This still doesn't help when there are giant log lines.
Those will still be problematic from an iOS memory perspective.
For more on that, see https://github.com/tailscale/corp/issues/2423.

(For those who cannot follow that link, it is a discussion
of particular problematic types of log lines for
particular categories of customers. The "categories of customers"
part is the reason that it is a private issue.)

There is also a latent bug here. If we ever encounter
a log line longer than bufio.MaxScanTokenSize,
then bufio.Scan will return an error,
and we'll truncate the file and discard the rest of the log.
That's not good, but bufio.MaxScanTokenSize is really big,
so it probably doesn't matter much in practice now.
Unfortunately, it does prevent us from easily capping the potential
memory usage here, on pain of losing log entries.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-08-17 12:32:10 -07:00
Josh Bleecher Snyder
a5da4ed981 all: gofmt with Go 1.17
This adds "//go:build" lines and tidies up existing "// +build" lines.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-08-05 15:54:00 -07:00
Josh Bleecher Snyder
0373ba36f3 logtail: fix typo in comment
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-08-02 14:32:02 -07:00
David Crawshaw
1606ef5219 logtail: print panics from previous runs on stderr
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2021-08-02 14:31:35 -07:00
David Crawshaw
297b3d6fa4 staticcheck.conf: turn off noisy lint errors
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2021-06-18 15:48:20 -07:00
Josh Bleecher Snyder
bb8ce48a6b logtail: allow changing log level concurrently
When tailscaled starts up, these lines run:

func run() error {
	// ...
	pol := logpolicy.New("tailnode.log.tailscale.io")
	pol.SetVerbosityLevel(args.verbose)
	// ...
}

If there are old log entries present, they immediate start getting uploaded. This races with the call to pol.SetVerbosityLevel.

This manifested itself as a test failure in tailscale.com/tstest/integration
when run with -race:

WARNING: DATA RACE
Read at 0x00c0001bc970 by goroutine 24:
  tailscale.com/logtail.(*Logger).Write()
      /Users/josh/t/corp/oss/logtail/logtail.go:517 +0x27c
  log.(*Logger).Output()
      /Users/josh/go/ts/src/log/log.go:184 +0x2b8
  log.Printf()
      /Users/josh/go/ts/src/log/log.go:323 +0x94
  tailscale.com/logpolicy.newLogtailTransport.func1()
      /Users/josh/t/corp/oss/logpolicy/logpolicy.go:509 +0x36c
  net/http.(*Transport).dial()
      /Users/josh/go/ts/src/net/http/transport.go:1168 +0x238
  net/http.(*Transport).dialConn()
      /Users/josh/go/ts/src/net/http/transport.go:1606 +0x21d0
  net/http.(*Transport).dialConnFor()
      /Users/josh/go/ts/src/net/http/transport.go:1448 +0xe4

Previous write at 0x00c0001bc970 by main goroutine:
  tailscale.com/logtail.(*Logger).SetVerbosityLevel()
      /Users/josh/t/corp/oss/logtail/logtail.go:131 +0x98
  tailscale.com/logpolicy.(*Policy).SetVerbosityLevel()
      /Users/josh/t/corp/oss/logpolicy/logpolicy.go:463 +0x60
  main.run()
      /Users/josh/t/corp/oss/cmd/tailscaled/tailscaled.go:178 +0x50
  main.main()
      /Users/josh/t/corp/oss/cmd/tailscaled/tailscaled.go:163 +0x71c

Goroutine 24 (running) created at:
  net/http.(*Transport).queueForDial()
      /Users/josh/go/ts/src/net/http/transport.go:1417 +0x4d8
  net/http.(*Transport).getConn()
      /Users/josh/go/ts/src/net/http/transport.go:1371 +0x5b8
  net/http.(*Transport).roundTrip()
      /Users/josh/go/ts/src/net/http/transport.go:585 +0x7f4
  net/http.(*Transport).RoundTrip()
      /Users/josh/go/ts/src/net/http/roundtrip.go:17 +0x30
  net/http.send()
      /Users/josh/go/ts/src/net/http/client.go:251 +0x4f0
  net/http.(*Client).send()
      /Users/josh/go/ts/src/net/http/client.go:175 +0x148
  net/http.(*Client).do()
      /Users/josh/go/ts/src/net/http/client.go:717 +0x1d0
  net/http.(*Client).Do()
      /Users/josh/go/ts/src/net/http/client.go:585 +0x358
  tailscale.com/logtail.(*Logger).upload()
      /Users/josh/t/corp/oss/logtail/logtail.go:367 +0x334
  tailscale.com/logtail.(*Logger).uploading()
      /Users/josh/t/corp/oss/logtail/logtail.go:289 +0xec


Rather than complicate the logpolicy API,
allow the verbosity to be adjusted concurrently.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-25 15:29:21 -07:00
Josh Bleecher Snyder
61e411344f logtail/filch: add staticcheck annotation
To work around a staticcheck bug when running with GOOS=windows.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-05-04 08:54:50 -07:00
Brad Fitzpatrick
b34fbb24e8 logtail: reduce PublicID.UnmarshalText from 2 allocs to 0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-15 10:42:12 -07:00
Brad Fitzpatrick
b4cf837d8a logtail: use link monitor to determine when to retry after upload failure
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-02 16:30:57 -08:00
Brad Fitzpatrick
c0cdca6d06 cmd/tailscaled, logtail: share link monitor from wgengine to logtail
Part of overall effort to clean up, unify, use link monitoring more,
and make Tailscale quieter when all networks are down. This is especially
bad on macOS where we can get killed for not being polite it seems.
(But we should be polite in any case)

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-01 22:09:43 -08:00
Denton Gentry
013da6660e logtail: add tests
+ add a test for parseAndRemoveLogLevel()
+ add a test for drainPendingMessages()
+ test JSON log encoding including several special cases

Other tests frequently send logs but a) don't check the result and
b) do so by happenstance, such that the code in encode() was not
consistently being exercised and leading to spurious changes in
code coverage. These tests attempt to more systematically test
the logging function.

This is the second attempt to add these tests, the first attempt
(in https://github.com/tailscale/tailscale/pull/1114) had two issues:
1. httptest.NewServer creates multiple goroutine handlers, and
   logtail uses goroutines to upload, but the first version had no
   locking in the server to guard this.
   Moved data handling into channels to get synchronization.
2. The channel to notify the test of the arrival of data had a depth
   of 1, in cases where the Logger sent multiple uploads it would
   block the server.

This resulted in the first iteration of these tests being flaky,
and we reverted it.

This new version of the tests has passed with
    go test -race -count=10000
and seems solid.

Signed-off-by: Denton Gentry <dgentry@tailscale.com>
2021-01-15 19:11:40 -08:00
Denton Gentry
ce058c8280
Revert "Add logtail tests (#1114)" (#1116)
This reverts commit e4f53e9b6f1a4d3d6f00091ddef617989b1ea3e4.

At least two of these tests are flakey, reverting until they can be
made more robust.

Signed-off-by: Denton Gentry <dgentry@tailscale.com>
2021-01-12 15:48:11 -08:00
Denton Gentry
e4f53e9b6f
Add logtail tests (#1114)
* logtail: test parseAndRemoveLogLevel()

Signed-off-by: Denton Gentry <dgentry@tailscale.com>

* logtail: test JSON log encoding.

Expand TestUploadMessages to also exercise the encoding functions
in logtail, like JSON logging and timestamps.

Other tests frequently send logs but a) don't check the result and
b) do so by happenstance, such that the lines in encode() were not
consistently being exercised and leading to spurious changes in
code coverage.

Signed-off-by: Denton Gentry <dgentry@tailscale.com>

* logtail: add a test for drainPendingMessages

Make the client buffer some messages before the upload server
becomes available.

Signed-off-by: Denton Gentry <dgentry@tailscale.com>

* logtail: use %q, raw strings, and io.WriteString

%q escapes binary characters for us.

raw strings avoid so much backslash escaping

Signed-off-by: Denton Gentry <dgentry@tailscale.com>
2021-01-12 13:31:45 -08:00
Denton Gentry
b771a1363b logtail: start a local server for TestFastShutdown
Right now TestFastShutdown tries to upload logs to localhost:1234,
which will most likely respond with an error. However if one has an
actual service running on port 1234, it would receive a connection
attempting to POST every time the unit test runs.

Start a local server and direct the upload there instead.

Signed-off-by: Denton Gentry <dgentry@tailscale.com>
2021-01-12 04:29:28 -08:00
Denton Gentry
2c328da094 logtail: add a test to upload logs to local server
Start an HTTP server to accept POST requests, and upload some logs to
it. Check that uploaded logs were received.

Code in logtail:drainPending was not being reliably exercised by other
tests. This shows up in code coverage reports, as lines of code in
drainPending are alternately added and subtracted from code coverage.
This test will reliably exercise and verify this code.

Signed-off-by: Denton Gentry <dgentry@tailscale.com>
2021-01-12 04:29:28 -08:00
Brad Fitzpatrick
d6e9fb1df0 all: adjust Unix permissions for those without umasks
Fixes tailscale/corp#1165

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-01-11 19:24:41 -08:00
Brad Fitzpatrick
57dd247376 cmd/tailscaled, logpolicy, logtail: support log levels
Log levels can now be specified with "[v1] " or "[v2] " substrings
that are then stripped and filtered at the final logger. This follows
our existing "[unexpected]" etc convention and doesn't require a
wholesale reworking of our logging at the moment.

cmd/tailscaled then gets a new --verbose=N flag to take a log level
that controls what gets logged to stderr (and thus systemd, syslog,
etc). Logtail is unaffected by --verbose.

This commit doesn't add annotations to any existing log prints. That
is in the next commit.

Updates #924
Updates #282

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2020-12-21 12:59:33 -08:00
Brad Fitzpatrick
d97ee12179 logtail, logpolicy: remove an unidiomatic use of an interface 2020-12-21 09:03:39 -08:00
Josh Bleecher Snyder
3a7402aa2d logtail: help the server be more efficient
Add content length hints to headers.
The server can use these hints to more efficiently select buffers.

Stop attempting to compress tiny requests.
The bandwidth savings are negligible (and sometimes negative!),
and it makes extra work for the server.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2020-11-24 12:00:32 -08:00
Josh Bleecher Snyder
883a11f2a8 logtail: fix typo in comment
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2020-11-17 13:43:54 -08:00
Brad Fitzpatrick
7a2a3955d3 logtail/filch: skip a broken test on Windows
Add a TODO with some notes about why it's skipped for now.

Updates #50
2020-10-14 21:33:01 -07:00
Josh Bleecher Snyder
585a0d8997 all: use testing.T.TempDir
Bit of Friday cleanup.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2020-10-02 20:31:31 -07:00
Brad Fitzpatrick
dd97111d06 backoff: update to Go style, document a bit, make 30s explicit
Also, bit of behavior change: on non-nil err but expired context,
don't reset the consecutive failure count. I don't think the old
behavior was intentional.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2020-08-09 09:36:26 -07:00
Elias Naur
bca9fe35ba logtail: return correct write size from logger.Write
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-07-27 11:06:41 -07:00
Avery Pennarun
6f590f5b52 logtail: we missed a case for the backoff timer.
We want to run bo.Backoff() after every upload, regardless. If
upload==true but err!=nil, we weren't backing off, which caused some
very-high-throughput log upload retries in bad network conditions.

Updates #282.

Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2020-06-05 03:55:45 -04:00
Avery Pennarun
5eb09c8f5e filch_test: clarify the use of os.RemoveAll().
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2020-05-27 18:50:44 -04:00
Avery Pennarun
4f128745d8 magicsock/test: oops, fix a data race in nested-test logf hack.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2020-05-13 23:50:09 -04:00
Avery Pennarun
08acb502e5 Add tstest.PanicOnLog(), and fix various problems detected by this.
If a test calls log.Printf, 'go test' horrifyingly rearranges the
output to no longer be in chronological order, which makes debugging
virtually impossible. Let's stop that from happening by making
log.Printf panic if called from any module, no matter how deep, during
tests.

This required us to change the default error handler in at least one
http.Server, as well as plumbing a bunch of logf functions around,
especially in magicsock and wgengine, but also in logtail and backoff.

To add insult to injury, 'go test' also rearranges the output when a
parent test has multiple sub-tests (all the sub-test's t.Logf is always
printed after all the parent tests t.Logf), so we need to screw around
with a special Logf that can point at the "current" t (current_t.Logf)
in some places. Probably our entire way of using subtests is wrong,
since 'go test' would probably like to run them all in parallel if you
called t.Parallel(), but it definitely can't because the're all
manipulating the shared state created by the parent test. They should
probably all be separate toplevel tests instead, with common
setup/teardown logic. But that's a job for another time.

Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2020-05-13 23:12:35 -04:00
Avery Pennarun
64db026c8b backoff: add a LogLongerThan configuration.
Some programs use frequent short-duration backoffs even under non-error
conditions. They can set this to avoid logging short backoffs when
things are operating normally, but still get messages when longer
backoffs kick in.

Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2020-04-29 05:41:39 -04:00
David Crawshaw
2372530964 logtail/backoff: only log backoffs > 2sec
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2020-04-29 17:41:48 +10:00
Brad Fitzpatrick
5d67365cc9 logtail: add PrivateID.IsZero method 2020-04-27 07:33:23 -07:00
Brad Fitzpatrick
c726c1eec9 logtail: add const DefaultHost with default server name 2020-04-26 08:31:07 -07:00
Brad Fitzpatrick
3464114b88 logtail: add ParsePublicID that doesn't allocate 2020-04-13 08:54:35 -07:00
Brad Fitzpatrick
996bf9cae7 logtail: don't send a User-Agent
Just useless bytes on the wire. Especially with HTTP/1.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2020-04-05 13:16:56 -07:00
Brad Fitzpatrick
bdc55d7091 logtail: add ParsePrivateID 2020-04-05 09:15:15 -07:00
Brad Fitzpatrick
e71a7c7a2c logtail: read to EOF on chunked response
We'll be fixing the server so this won't trigger in practice,
but it demos the connection reuse problem.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2020-04-04 16:12:15 -07:00
Brad Fitzpatrick
fcb6a34f4b logtail: reduce allocations encoding text
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2020-04-04 16:12:02 -07:00
David Crawshaw
6ddbcab71e logtail: rename the unused CheckLogs to DrainLogs
Its semantics has changed slightly, this will let us use it to
drive batched logging in special circumstances.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2020-03-20 13:26:33 +11:00
David Crawshaw
51a12d1307 filch: a few minor comments
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2020-03-20 13:20:45 +11:00
Brad Fitzpatrick
8f9fa6a842 logtail: minor style/simplification changes 2020-03-15 22:41:50 -07:00
David Anderson
de0239375a logtail/filch: use x/sys/unix instead of syscall.
In particular, the Dup2 syscall is not defined in the syscall
package on arm64, but is defined in x/sys/unix.

Signed-off-by: David Anderson <dave@natulte.net>
2020-02-13 23:56:39 -08:00
David Anderson
8380d8575a logtail: remove unused code from old sentinel logic.
Signed-off-by: David Anderson <dave@natulte.net>
2020-02-11 10:33:36 -08:00