tailscale

mirror of https://github.com/tailscale/tailscale.git synced 2025-12-24 17:47:30 +00:00

Author	SHA1	Message	Date
Nick Khyl	1ccece0f78	util/eventbus: use unbounded event queues for DeliveredEvents in subscribers Bounded DeliveredEvent queues reduce memory usage, but they can deadlock under load. Two common scenarios trigger deadlocks when the number of events published in a short period exceeds twice the queue capacity (there's a PublishedEvent queue of the same size): - a subscriber tries to acquire the same mutex as held by a publisher, or - a subscriber for A events publishes B events Avoiding these scenarios is not practical and would limit eventbus usefulness and reduce its adoption, pushing us back to callbacks and other legacy mechanisms. These deadlocks already occurred in customer devices, dev machines, and tests. They also make it harder to identify and fix slow subscribers and similar issues we have been seeing recently. Choosing an arbitrary large fixed queue capacity would only mask the problem. A client running on a sufficiently large and complex customer environment can exceed any meaningful constant limit, since event volume depends on the number of peers and other factors. Behavior also changes based on scheduling of publishers and subscribers by the Go runtime, OS, and hardware, as the issue is essentially a race between publishers and subscribers. Additionally, on lower-end devices, an unreasonably high constant capacity is practically the same as using unbounded queues. Therefore, this PR changes the event queue implementation to be unbounded by default. The PublishedEvent queue keeps its existing capacity of 16 items, while subscribers' DeliveredEvent queues become unbounded. This change fixes known deadlocks and makes the system stable under load, at the cost of higher potential memory usage, including cases where a queue grows during an event burst and does not shrink when load decreases. Further improvements can be implemented in the future as needed. Fixes #17973 Fixes #18012 Signed-off-by: Nick Khyl <nickk@tailscale.com>	2025-11-21 16:00:12 -06:00
Nick Khyl	3780f25d51	util/eventbus: add tests for a subscriber publishing events As of 2025-11-20, publishing more events than the eventbus's internal queues can hold may deadlock if a subscriber tries to publish events itself. This commit adds a test that demonstrates this deadlock, and skips it until the bug is fixed. Updates #18012 Signed-off-by: Nick Khyl <nickk@tailscale.com>	2025-11-21 13:35:48 -06:00
Nick Khyl	016ccae2da	util/eventbus: add tests for a subscriber trying to acquire the same mutex as a publisher As of 2025-11-20, publishing more events than the eventbus's internal queues can hold may deadlock if a subscriber tries to acquire a mutex that can also be held by a publisher. This commit adds a test that demonstrates this deadlock, and skips it until the bug is fixed. Updates #17973 Signed-off-by: Nick Khyl <nickk@tailscale.com>	2025-11-21 13:35:48 -06:00
Brad Fitzpatrick	6ac4356bce	util/eventbus: simplify some reflect in Bus.pump Updates #cleanup Change-Id: Ib7b497e22c6cdd80578c69cf728d45754e6f909e Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2025-11-19 06:23:34 -08:00
Brad Fitzpatrick	99b06eac49	syncs: add Mutex/RWMutex alias/wrappers for future mutex debugging Updates #17852 Change-Id: I477340fb8e40686870e981ade11cd61597c34a20 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2025-11-16 19:13:59 -08:00
Brad Fitzpatrick	1eba5b0cbd	util/eventbus: log goroutine stacks when hung in CI Updates #17680 Change-Id: Ie48dc2d64b7583d68578a28af52f6926f903ca4f Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2025-11-11 00:42:31 -08:00
M. J. Fromberger	4c856078e4	util/eventbus: block for the subscriber during SubscribeFunc close (#17642 ) Prior to this change a SubscriberFunc treated the call to the subscriber's function as the completion of delivery. But that means when we are closing the subscriber, that callback could continue to execute for some time after the close returns. For channel-based subscribers that works OK because the close takes effect before the subscriber ever sees the event. To make the two subscriber types symmetric, we should also wait for the callback to finish before returning. This ensures that a Close of the client means the same thing with both kinds of subscriber. Updates #17638 Change-Id: I82fd31bcaa4e92fab07981ac0e57e6e3a7d9d60b Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2025-10-31 09:58:09 -07:00
M. J. Fromberger	061e6266cf	util/eventbus: allow logging of slow subscribers (#17705 ) Add options to the eventbus.Bus to plumb in a logger. Route that logger in to the subscriber machinery, and trigger a log message to it when a subscriber fails to respond to its delivered events for 5s or more. The log message includes the package, filename, and line number of the call site that created the subscription. Add tests that verify this works. Updates #17680 Change-Id: I0546516476b1e13e6a9cf79f19db2fe55e56c698 Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2025-10-30 14:40:57 -07:00
Claus Lensbøl	005e264b54	util/eventbus/eventbustest: add support for synctest instead of timers (#17522 ) Before synctest, timers was needed to allow the events to flow into the test bus. There is still a timer, but this one is not derived from the test deadline and it is mostly arbitrary as synctest will render it practically non-existent. With this approach, tests that do not need to test for the absence of events do not rely on synctest. Updates #15160 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2025-10-10 15:33:30 -04:00
M. J. Fromberger	0a33aae823	util/eventbus: run subscriber functions in a goroutine (#17510 ) With a channel subscriber, the subscription processing always occurs on another goroutine. The SubscriberFunc (prior to this commit) runs its callbacks on the client's own goroutine. This changes the semantics, though: In addition to more directly pushing back on the publisher, a publisher and subscriber can deadlock in a SubscriberFunc but succeed on a Subscriber. They should behave equivalently regardless which interface they use. Arguably the caller should deal with this by creating its own goroutine if it needs to. However, that loses much of the benefit of the SubscriberFunc API, as it will need to manage the lifecycle of that goroutine. So, for practical ergonomics, let's make the SubscriberFunc do this management on the user's behalf. (We discussed doing this in #17432, but decided not to do it yet). We can optimize this approach further, if we need to, without changing the API. Updates #17487 Change-Id: I19ea9e8f246f7b406711f5a16518ef7ff21a1ac9 Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2025-10-10 09:03:38 -07:00
M. J. Fromberger	ad6cf2f8f3	util/eventbus: add a function-based subscriber type (#17432 ) Originally proposed by @bradfitz in #17413. In practice, a lot of subscribers have only one event type of interest, or a small number of mostly independent ones. In that case, the overhead of running and maintaining a goroutine to select on multiple channels winds up being more noisy than we'd like for the user of the API. For this common case, add a new SubscriberFunc[T] type that delivers events to a callback owned by the subscriber, directly on the goroutine belonging to the client itself. This frees the consumer from the need to maintain their own goroutine to pull events from the channel, and to watch for closure of the subscriber. Before: s := eventbus.Subscribe[T](eventClient) go func() { for { select { case <-s.Done(): return case e := <-s.Events(): doSomethingWith(e) } } }() // ... s.Close() After: func doSomethingWithT(e T) { ... } s := eventbus.SubscribeFunc(eventClient, doSomethingWithT) // ... s.Close() Moreover, unless the caller wants to explicitly stop the subscriber separately from its governing client, it need not capture the SubscriberFunc value at all. One downside of this approach is that a slow or deadlocked callback could block client's service routine and thus stall all other subscriptions on that client, However, this can already happen more broadly if a subscriber fails to service its delivery channel in a timely manner, it just feeds back more immediately. Updates #17487 Change-Id: I64592d786005177aa9fd445c263178ed415784d5 Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2025-10-07 16:43:22 -07:00
Brad Fitzpatrick	9386a101d8	cmd/tailscaled, ipn/localapi, util/eventbus: don't link in regexp when debug is omitted Saves 442 KB. Lock it with a new min test. Updates #12614 Change-Id: Ia7bf6f797b6cbf08ea65419ade2f359d390f8e91 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2025-09-30 12:13:17 -07:00
Brad Fitzpatrick	be6cfa00cb	util/eventbus: when ts_omit_debugeventbus is set, don't import tsweb I'm trying to remove the "regexp" and "regexp/syntax" packages from our minimal builds. But tsweb pulls in regexp (via net/http/pprof etc) and util/eventbus was importing the tsweb for no reason. Updates #12614 Change-Id: Ifa8c371ece348f1dbf80d6b251381f3ed39d5fbd Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2025-09-30 11:32:51 -07:00
Brad Fitzpatrick	a40f23ad4a	util/eventbus: flesh out docs a bit Updates #cleanup Change-Id: Ia6b0e4b0426be1dd10a777aff0a81d4dd6b69b01 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2025-09-25 09:48:33 -07:00
M. J. Fromberger	df747f1c1b	util/eventbus: add a Done method to the Monitor type (#17263 ) Some systems need to tell whether the monitored goroutine has finished alongside other channel operations (notably in this case the relay server, but there seem likely to be others similarly situated). Updates #15160 Change-Id: I5f0f3fae827b07f9b7102a3b08f60cda9737fe28 Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2025-09-24 09:14:41 -07:00
M. J. Fromberger	e59fbaab64	util/eventbus: give a nicer error when attempting to use a closed client (#17208 ) It is a programming error to Publish or Subscribe on a closed Client, but now the way you discover that is by getting a panic from down in the machinery of the bus after the client state has been cleaned up. To provide a more helpful error, let's panic explicitly when that happens and say what went wrong ("the client is closed"), by preventing subscriptions from interleaving with closure of the client. With this change, either an attachment fails outright (because the client is already closed) or completes and then shuts down in good order in the normal course. This does not change the semantics of the client, publishers, or subscribers, it's just making the failure more eager so we can attach explanatory text. Updates #15160 Change-Id: Ia492f4c1dea7535aec2cdcc2e5ea5410ed5218d2 Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2025-09-22 07:07:57 -07:00
M. J. Fromberger	ca9d795006	util/eventbus: add a Monitor type to manage subscriber goroutines (#17127 ) A common pattern in event bus usage is to run a goroutine to service a collection of subscribers on a single bus client. To have an orderly shutdown, however, we need a way to wait for such a goroutine to be finished. This commit adds a Monitor type that makes this pattern easier to wire up: rather than having to track all the subscribers and an extra channel, the component need only track the client and the monitor. For example: cli := bus.Client("example") m := cli.Monitor(func(c *eventbus.Client) { s1 := eventbus.Subscribe[T](cli) s2 := eventbus.Subscribe[U](cli) for { select { case <-c.Done(): return case t := <-s1.Events(): processT(t) case u := <-s2.Events(): processU(u) } } }) To shut down the client and wait for the goroutine, the caller can write: m.Close() which closes cli and waits for the goroutine to finish. Or, separately: cli.Close() // do other stuff m.Wait() While the goroutine management is not explicitly tied to subscriptions, it is a common enough pattern that this seems like a useful simplification in use. Updates #15160 Change-Id: I657afda1cfaf03465a9dce1336e9fd518a968bca Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2025-09-19 12:34:06 -07:00
Claus Lensbøl	009d702adf	health: remove direct callback and replace with eventbus (#17199 ) Pulls out the last callback logic and ensures timers are still running. The eventbustest package is updated support the absence of events. Updates #15160 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2025-09-19 14:58:37 -04:00
Brad Fitzpatrick	d559a21418	util/eventbus/eventbustest: fix typo of test name And another case of the same typo in a comment elsewhere. Updates #cleanup Change-Id: Iaa9d865a1cf83318d4a30263c691451b5d708c9c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2025-09-19 11:25:10 -07:00
M. J. Fromberger	fc9a74a405	util/eventbus: fix flakes in eventbustest tests (#17198 ) When tests run in parallel, events from multiple tests on the same bus can intercede with each other. This is working as intended, but for the test cases we want to control exactly what goes through the bus. To fix that, allocate a fresh bus for each subtest. Fixes #17197 Change-Id: I53f285ebed8da82e72a2ed136a61884667ef9a5e Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2025-09-19 07:56:45 -07:00
M. J. Fromberger	4f211ea5c5	util/eventbus: add a LogAllEvents helper for testing (#17187 ) When developing (and debugging) tests, it is useful to be able to see all the traffic that transits the event bus during the execution of a test. Updates #15160 Change-Id: I929aee62ccf13bdd4bd07d786924ce9a74acd17a Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2025-09-18 12:44:06 -07:00
M. J. Fromberger	6992f958fc	util/eventbus: add an EqualTo helper for testing (#17178 ) For a common case of events being simple struct types with some exported fields, add a helper to check (reflectively) for equal values using cmp.Diff so that a failed comparison gives a useful diff in the test output. More complex uses will still want to provide their own comparisons; this (intentionally) does not export diff options or other hooks from the cmp package. Updates #15160 Change-Id: I86bee1771cad7debd9e3491aa6713afe6fd577a6 Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2025-09-17 08:39:29 -07:00
M. J. Fromberger	48029a897d	util/eventbus: allow test expectations reporting only an error (#17146 ) Extend the Expect method of a Watcher to allow filter functions that report only an error value, and which "pass" when the reported error is nil. Updates #15160 Change-Id: I582d804554bd1066a9e499c1f3992d068c9e8148 Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2025-09-17 07:20:34 -07:00
Claus Lensbøl	2015ce4081	health,ipn/ipnlocal: introduce eventbus in heath.Tracker (#17085 ) The Tracker was using direct callbacks to ipnlocal. This PR moves those to be triggered via the eventbus. Additionally, the eventbus is now closed on exit from tailscaled explicitly, and health is now a SubSystem in tsd. Updates #15160 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2025-09-16 11:25:29 -04:00
M. J. Fromberger	5b5ae2b2ee	util/eventbus: add a Done channel to the Client (#17118 ) Subscribers already have a Done channel that the caller can use to detect when the subscriber has been closed. Typically this happens when the governing Client closes, which in turn is typically because the Bus closed. But clients and subscribers can stop at other times too, and a caller has no good way to tell the difference between "this subscriber closed but the rest are OK" and "the client closed and all these subscribers are finished". We've worked around this in practice by knowing the closure of one subscriber implies the fate of the rest, but we can do better: Add a Done method to the Client that allows us to tell when that has been closed explicitly, after all the publishers and subscribers associated with that client have been closed. This allows the caller to be sure that, by the time that occurs, no further pending events are forthcoming on that client. Updates #15160 Change-Id: Id601a79ba043365ecdb47dd035f1fdadd984f303 Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2025-09-16 07:44:08 -07:00
Claus Lensbøl	b816fd7117	control/controlclient: introduce eventbus messages instead of callbacks (#16956 ) This is a small introduction of the eventbus into controlclient that communicates with mainly ipnlocal. While ipnlocal is a complicated part of the codebase, the subscribers here are from the perspective of ipnlocal already called async. Updates #15160 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2025-09-15 10:36:17 -04:00
Brad Fitzpatrick	ffc82ad820	util/eventbus: add ts_omit_debugeventbus Updates #17063 Change-Id: Ibc98dd2088f82c829effa71f72f3e2a5abda5038 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2025-09-08 12:40:25 -07:00
Claus Lensbøl	5bb42e3018	wgengine/router: rely on events for deleted IP rules (#16744 ) Adds the eventbus to the router subsystem. The event is currently only used on linux. Also includes facilities to inject events into the bus. Updates #15160 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2025-08-05 08:31:51 -04:00
Claus Lensbøl	d334d9ba07	client/local,cmd/tailscale/cli,ipn/localapi: expose eventbus graph (#16597 ) Make it possible to dump the eventbus graph as JSON or DOT to both debug and document what is communicated via the bus. Updates #15160 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2025-07-18 10:55:17 -04:00
Claus Lensbøl	53f67c4396	util/eventbus: fix docstrings (#16401 ) Updates #15160 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2025-06-27 10:03:56 -04:00
Claus Lensbøl	f2f1236ad4	util/eventbus: add test helpers to simplify testing events (#16294 ) Instead of every module having to come up with a set of test methods for the event bus, this handful of test helpers hides a lot of the needed setup for the testing of the event bus. The tests in portmapper is also ported over to the new helpers. Updates #15160 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2025-06-25 09:00:34 -04:00
Nick Khyl	866614202c	util/eventbus: remove redundant code from eventbus.Publish eventbus.Publish() calls newPublisher(), which in turn invokes (*Client).addPublisher(). That method adds the new publisher to c.pub, so we don’t need to add it again in eventbus.Publish. Updates #cleanup Signed-off-by: Nick Khyl <nickk@tailscale.com>	2025-06-16 11:46:28 -05:00
Claus Lensbøl	6010812f0c	ipn/localapi,client/local: add debug watcher for bus events (#16239 ) Updates: #15160 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2025-06-11 14:22:30 -04:00
Brad Fitzpatrick	e2814871a7	util/eventbus: also disable websocket debug on Android So tsnet-on-Android is smaller, like iOS. Updates #12614 Updates #15297 Change-Id: I97ae997f5d17576024470fe5fea93d9f5f134bde Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2025-05-08 10:03:42 -07:00
David Anderson	e091e71937	util/eventbus: remove debug UI from iOS build The use of html/template causes reflect-based linker bloat. Longer term we have options to bring the UI back to iOS, but for now, cut it out. Updates #15297 Signed-off-by: David Anderson <dave@tailscale.com>	2025-03-18 17:04:15 -07:00
M. J. Fromberger	0663412559	util/eventbus: add basic throughput benchmarks (#15284 ) Shovel small events through the pipeine as fast as possible in a few basic configurations, to establish some baseline performance numbers. Updates #15160 Change-Id: I1dcbbd1109abb7b93aa4dcb70da57f183eb0e60e Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2025-03-13 08:06:20 -07:00
David Anderson	6d217d81d1	util/eventbus: add a helper program for bus development The demo program generates a stream of made up bus events between a number of bus actors, as a way to generate some interesting activity to show on the bus debug page. Signed-off-by: David Anderson <dave@tailscale.com>	2025-03-12 17:47:47 -07:00
David Anderson	d83024a63f	util/eventbus: add a debug HTTP handler for the bus Updates #15160 Signed-off-by: David Anderson <dave@tailscale.com>	2025-03-12 17:47:47 -07:00
David Anderson	346a35f612	util/eventbus: add debugger methods to list pub/sub types This lets debug tools list the types that clients are wielding, so that they can build a dataflow graph and other debugging views. Updates #15160 Signed-off-by: David Anderson <dave@tailscale.com>	2025-03-07 14:28:04 -08:00
David Anderson	e71e95b841	util/eventbus: don't allow publishers to skip events while debugging If any debugging hook might see an event, Publisher.ShouldPublish should tell its caller to publish even if there are no ordinary subscribers. Updates #15160 Signed-off-by: David Anderson <dave@tailscale.com>	2025-03-07 14:27:48 -08:00
David Anderson	853abf8661	util/eventbus: initial debugging facilities for the event bus Enables monitoring events as they flow, listing bus clients, and snapshotting internal queues to troubleshoot stalls. Updates #15160 Signed-off-by: David Anderson <dave@tailscale.com>	2025-03-07 12:48:32 -08:00
David Anderson	e80d2b4ad1	util/eventbus: add debug hooks to snoop on bus traffic Updates #15160 Signed-off-by: David Anderson <dave@tailscale.com>	2025-03-06 18:43:19 -08:00
David Anderson	dd7166cb8e	util/eventbus: add internal hook type for debugging Publicly exposed debugging functions will use these hooks to observe dataflow in the bus. Updates #15160 Signed-off-by: David Anderson <dave@tailscale.com>	2025-03-06 18:43:19 -08:00
David Anderson	cf5c788cf1	util/eventbus: track additional event context in subscribe queue Updates #15160 Signed-off-by: David Anderson <dave@tailscale.com>	2025-03-05 18:29:34 -08:00
David Anderson	a1192dd686	util/eventbus: track additional event context in publish queue Updates #15160 Signed-off-by: David Anderson <dave@tailscale.com>	2025-03-05 18:29:34 -08:00
David Anderson	bf40bc4fa0	util/eventbus: make internal queue a generic type In preparation for making the queues carry additional event metadata. Updates #15160 Signed-off-by: David Anderson <dave@tailscale.com>	2025-03-05 18:29:34 -08:00
David Anderson	24d4846f00	util/eventbus: adjust worker goroutine management helpers This makes the helpers closer in behavior to cancelable contexts and taskgroup.Single, and makes the worker code use a more normal and easier to reason about context.Context for shutdown. Updates #15160 Signed-off-by: David Anderson <dave@tailscale.com>	2025-03-05 08:35:13 -08:00
David Anderson	3e18434595	util/eventbus: rework to have a Client abstraction The Client carries both publishers and subscribers for a single actor. This makes the APIs for publish and subscribe look more similar, and this structure is a better fit for upcoming debug facilities. Updates #15160 Signed-off-by: David Anderson <dave@tailscale.com>	2025-03-04 17:38:20 -08:00
David Anderson	ef906763ee	util/eventbus: initial implementation of an in-process event bus Updates #15160 Signed-off-by: David Anderson <dave@tailscale.com> Co-authored-by: M. J. Fromberger <fromberger@tailscale.com>	2025-02-28 13:45:43 -08:00

49 Commits