* add shutdown that asserts if headscale had panics
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* add test case producing 2118 panic
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* make stream shutdown if self-node has been removed
Currently we will read the node from database, and since it is
deleted, the id might be set to nil. Keep the node around and
just shutdown, so it is cleanly removed from notifier.
Fixes#2118
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
---------
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* replace old suite approved routes test with table driven
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* add test to reproduce issue
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* add integration test for 2068
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
---------
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* reformat code
This is mostly an automated change with `make lint`.
I had to manually please golangci-lint in routes_test because of a short
variable name.
* fix start -> strategy which was wrongly corrected by linter
* replace ephemeral deletion logic
this commit replaces the way we remove ephemeral nodes,
currently they are deleted in a loop and we look at last seen
time. This time is now only set when a node disconnects and
there was a bug (#2006) where nodes that had never disconnected
was deleted since they did not have a last seen.
The new logic will start an expiry timer when the node disconnects
and delete the node from the database when the timer is up.
If the node reconnects within the expiry, the timer is cancelled.
Fixes#2006
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* use uint64 as authekyid and ptr helper in tests
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* add test db helper
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* add list ephemeral node func
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* schedule ephemeral nodes for removal on startup
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* fix gorm query for postgres
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* add godoc
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
---------
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This PR removes the complicated session management introduced in https://github.com/juanfont/headscale/pull/1791 which kept track of the sessions in a map, in addition to the channel already kept track of in the notifier.
Instead of trying to close the mapsession, it will now be replaced by the new one and closed after so all new updates goes to the right place.
The map session serve function is also split into a streaming and a non-streaming version for better readability.
RemoveNode in the notifier will not remove a node if the channel is not matching the one that has been passed (e.g. it has been replaced with a new one).
A new tuning parameter has been added to added to set timeout before the notifier gives up to send an update to a node.
Add a keep alive resetter so we wait with sending keep alives if a node has just received an update.
In addition it adds a bunch of env debug flags that can be set:
- `HEADSCALE_DEBUG_HIGH_CARDINALITY_METRICS`: make certain metrics include per node.id, not recommended to use in prod.
- `HEADSCALE_DEBUG_PROFILING_ENABLED`: activate tracing
- `HEADSCALE_DEBUG_PROFILING_PATH`: where to store traces
- `HEADSCALE_DEBUG_DUMP_CONFIG`: calls `spew.Dump` on the config object startup
- `HEADSCALE_DEBUG_DEADLOCK`: enable go-deadlock to dump goroutines if it looks like a deadlock has occured, enabled in integration tests.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This commit restructures the map session in to a struct
holding the state of what is needed during its lifetime.
For streaming sessions, the event loop is structured a
bit differently not hammering the clients with updates
but rather batching them over a short, configurable time
which should significantly improve cpu usage, and potentially
flakyness.
The use of Patch updates has been dialed back a little as
it does not look like its a 100% ready for prime time. Nodes
are now updated with full changes, except for a few things
like online status.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* create channel before sending first update
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* do not notify on register, wait for connect
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
---------
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This commits removes the locks used to guard data integrity for the
database and replaces them with Transactions, turns out that SQL had
a way to deal with this all along.
This reduces the complexity we had with multiple locks that might stack
or recurse (database, nofitifer, mapper). All notifications and state
updates are now triggered _after_ a database change.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* upgrade tailscale
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* make Node object use actualy tailscale key types
This commit changes the Node struct to have both a field for strings
to store the keys in the database and a dedicated Key for each type
of key.
The keys are populated and stored with Gorm hooks to ensure the data
is stored in the db.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* use key types throughout the code
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* make sure machinekey is concistently used
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* use machine key in auth url
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* fix web register
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* use key type in notifier
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* fix relogin with webauth
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
---------
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This commit rearranges the poll handler to immediatly accept
updates and notify its peers and return, not travel down the
function for a bit. This reduces the DB calls and other
holdups that isnt necessary to send a "lite response", a
map response without peers, or accepting an endpoint update.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This commit changes the internals of the mapper to
track all the changes to peers over its lifetime.
This means that it no longer depends on the database
and this should hopefully help with locks and timing issues.
When the mapper is created, it needs the current list of peers,
the world view, when the polling session was started. Then as
update changes are called, it tracks the changes and generates
responses based on its internal list.
As a side, the types.Machines and types.MachinesP, as well as
types.Machine being passed as a full struct and pointer has been
changed to always be pointers, everywhere.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This commits extends the mapper with functions for creating "delta"
MapResponses for different purposes (peer changed, peer removed, derp).
This wires up the new state management with a new StateUpdate struct
letting the poll worker know what kind of update to send to the
connected nodes.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This commit replaces the timestamp based state system with a new
one that has update channels directly to the connected nodes. It
will send an update to all listening clients via the polling
mechanism.
It introduces a new package notifier, which has a concurrency safe
manager for all our channels to the connected nodes.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This commit makes a wrapper function round the normalisation requiring
"stripEmailDomain" which has to be passed in almost all functions of
headscale by loading it from Viper instead.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This commit renames a bunch of files to try to make it a bit less confusing;
protocol_ is now auth as they contained registration, auth and login/out flow
protocol_.*_poll is now poll.go
api.go and other generic handlers are now handlers.go
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>