This commit replaces the ChangeSet with a simpler bool based
change model that can be directly used in the map builder to
build the appropriate map response based on the change that
has occured. Previously, we fell back to sending full maps
for a lot of changes as that was consider "the safe" thing to
do to ensure no updates were missed.
This was slightly problematic as a node that already has a list
of peers will only do full replacement of the peers if the list
is non-empty, meaning that it was not possible to remove all
nodes (if for example policy changed).
Now we will keep track of last seen nodes, so we can send remove
ids, but also we are much smarter on how we send smaller, partial
maps when needed.
Fixes#2389
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
This PR investigates, adds tests and aims to correctly implement Tailscale's model for how Tags should be accepted, assigned and used to identify nodes in the Tailscale access and ownership model.
When evaluating in Headscale's policy, Tags are now only checked against a nodes "tags" list, which defines the source of truth for all tags for a given node. This simplifies the code for dealing with tags greatly, and should help us have less access bugs related to nodes belonging to tags or users.
A node can either be owned by a user, or a tag.
Next, to ensure the tags list on the node is correctly implemented, we first add tests for every registration scenario and combination of user, pre auth key and pre auth key with tags with the same registration expectation as observed by trying them all with the Tailscale control server. This should ensure that we implement the correct behaviour and that it does not change or break over time.
Lastly, the missing parts of the auth has been added, or changed in the cases where it was wrong. This has in large parts allowed us to delete and simplify a lot of code.
Now, tags can only be changed when a node authenticates or if set via the CLI/API. Tags can only be fully overwritten/replaced and any use of either auth or CLI will replace the current set if different.
A user owned device can be converted to a tagged device, but it cannot be changed back. A tagged device can never remove the last tag either, it has to have a minimum of one.
This PR changes tags to be something that exists on nodes in addition to users, to being its own thing. It is part of moving our tags support towards the correct tailscale compatible implementation.
There are probably rough edges in this PR, but the intention is to get it in, and then start fixing bugs from 0.28.0 milestone (long standing tags issue) to discover what works and what doesnt.
Updates #2417Closes#2619
This PR addresses some consistency issues that was introduced or discovered with the nodestore.
nodestore:
Now returns the node that is being put or updated when it is finished. This closes a race condition where when we read it back, we do not necessarily get the node with the given change and it ensures we get all the other updates from that batch write.
auth:
Authentication paths have been unified and simplified. It removes a lot of bad branches and ensures we only do the minimal work.
A comprehensive auth test set has been created so we do not have to run integration tests to validate auth and it has allowed us to generate test cases for all the branches we currently know of.
integration:
added a lot more tooling and checks to validate that nodes reach the expected state when they come up and down. Standardised between the different auth models. A lot of this is to support or detect issues in the changes to nodestore (races) and auth (inconsistencies after login and reaching correct state)
This PR was assisted, particularly tests, by claude code.
- tailscale client gets a new AuthUrl and sets entry in the regcache
- regcache entry expires
- client doesn't know about that
- client always polls followup request а gets error
When user clicks "Login" in the app (after cache expiry), they visit
invalid URL and get "node not found in registration cache". Some clients
on Windows for e.g. can't get a new AuthUrl without restart the app.
To fix that we can issue a new reg id and return user a new valid
AuthUrl.
RegisterNode is refactored to be created with NewRegisterNode() to
autocreate channel and other stuff.
Initial work on a nodestore which stores all of the nodes
and their relations in memory with relationship for peers
precalculated.
It is a copy-on-write structure, replacing the "snapshot"
when a change to the structure occurs. It is optimised for reads,
and while batches are not fast, they are grouped together
to do less of the expensive peer calculation if there are many
changes rapidly.
Writes will block until commited, while reads are never
blocked.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This commit changes most of our (*)types.Node to
types.NodeView, which is a readonly version of the
underlying node ensuring that there is no mutations
happening in the read path.
Based on the migration, there didnt seem to be any, but the
idea here is to prevent it in the future and simplify other
new implementations.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
this commit moves all of the read and write logic, and all different parts
of headscale that manages some sort of persistent and in memory state into
a separate package.
The goal of this is to clearly define the boundry between parts of the app
which accesses and modifies data, and where it happens. Previously, different
state (routes, policy, db and so on) was used directly, and sometime passed to
functions as pointers.
Now all access has to go through state. In the initial implementation,
most of the same functions exists and have just been moved. In the future
centralising this will allow us to optimise bottle necks with the database
(in memory state) and make the different parts talking to eachother do so
in the same way across headscale components.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* integration: ensure route is set before node joins, reproduce
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* auth: ensure that routes are autoapproved when the node is stored
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
---------
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* fix issue auto approve route on register bug
This commit fixes an issue where routes where not approved
on a node during registration. This cause the auto approval
to require the node to readvertise the routes.
Fixes#2497Fixes#2485
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* hsic: only set db policy if exist
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* policy: calculate changed based on policy and filter
v1 is a bit simpler than v2, it does not pre calculate the auto approver map
and we cannot tell if it is changed.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
---------
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* populate serving from primary routes
Depends on #2464Fixes#2480
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* also exit
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* fix route update outside of connection
there was a bug where routes would not be updated if
they changed while a node was connected and it was not part of an
autoapprove.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* update expected test output, cli only shows service node
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
---------
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* utility iterator for ipset
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* split policy -> policy and v1
This commit split out the common policy logic and policy implementation
into separate packages.
policy contains functions that are independent of the policy implementation,
this typically means logic that works on tailcfg types and generic formats.
In addition, it defines the PolicyManager interface which the v1 implements.
v1 is a subpackage which implements the PolicyManager using the "original"
policy implementation.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* use polivyv1 definitions in integration tests
These can be marshalled back into JSON, which the
new format might not be able to.
Also, just dont change it all to JSON strings for now.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* formatter: breaks lines
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* remove compareprefix, use tsaddr version
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* remove getacl test, add back autoapprover
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* use policy manager tag handling
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* rename display helper for user
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* introduce policy v2 package
policy v2 is built from the ground up to be stricter
and follow the same pattern for all types of resolvers.
TODO introduce
aliass
resolver
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* wire up policyv2 in integration testing
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* split policy v2 tests into seperate workflow to work around github limit
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* add policy manager output to /debug
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* update changelog
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
---------
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This helps preventing messages being sent with the wrong update type
and payload combination, and it is shorter/neater.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* add dedicated http error to propagate to user
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* classify user errors in http handlers
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* move validation of pre auth key out of db
This move separates the logic a bit and allow us to
write specific errors for the caller, in this case the web
layer so we can present the user with the correct error
codes without bleeding web stuff into a generic validate.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* update changelog
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
---------
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* ensure valid tags is populated on user gets too
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* ensure forced tags are added
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* remove unused envvar in test
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* debug log auth/unauth tags in policy man
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* defer shutdown in tags test
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* add tag test with groups
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* add email, display name, picture to create user
Updates #2166
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* add ability to set display and email to cli
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* add email to test users in integration
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* fix issue where tags were only assigned to email, not username
Fixes#2300Fixes#2307
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* expand principles to correct login name
and if fix an issue where nodeip principles might not expand to all
relevant IPs instead of taking the first in a prefix.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* fix ssh unit test
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* update cli and oauth tests for users with email
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* index by test email
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* fix last test
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
---------
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
expand user, add claims to user
This commit expands the user table with additional fields that
can be retrieved from OIDC providers (and other places) and
uses this data in various tailscale response objects if it is
available.
This is the beginning of implementing
https://docs.google.com/document/d/1X85PMxIaVWDF6T_UPji3OeeUqVBcGj_uHRM5CI-AwlY/edit
trying to make OIDC more coherant and maintainable in addition
to giving the user a better experience and integration with a
provider.
remove usernames in magic dns, normalisation of emails
this commit removes the option to have usernames as part of MagicDNS
domains and headscale will now align with Tailscale, where there is a
root domain, and the machine name.
In addition, the various normalisation functions for dns names has been
made lighter not caring about username and special character that wont
occur.
Email are no longer normalised as part of the policy processing.
untagle oidc and regcache, use typed cache
This commits stops reusing the registration cache for oidc
purposes and switches the cache to be types and not use any
allowing the removal of a bunch of casting.
try to make reauth/register branches clearer in oidc
Currently there was a function that did a bunch of stuff,
finding the machine key, trying to find the node, reauthing
the node, returning some status, and it was called validate
which was very confusing.
This commit tries to split this into what to do if the node
exists, if it needs to register etc.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* move logic for validating node names
this commits moves the generation of "given names" of nodes
into the registration function, and adds validation of renames
to RenameNode using the same logic.
Fixes#2121
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* fix double arg
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
---------
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* validate policy against nodes, error if not valid
this commit aims to improve the feedback of "runtime" policy
errors which would only manifest when the rules are compiled to
filter rules with nodes.
this change will in;
file-based mode load the nodes from the db and try to compile the rules on
start up and return an error if they would not work as intended.
database-based mode prevent a new ACL being written to the database if
it does not compile with the current set of node.
Fixes#2073Fixes#2044
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* ensure stderr can be used in err checks
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* test policy set validation
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* add new integration test to ghaction
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* add back defer for cli tst
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
---------
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This commit restructures the map session in to a struct
holding the state of what is needed during its lifetime.
For streaming sessions, the event loop is structured a
bit differently not hammering the clients with updates
but rather batching them over a short, configurable time
which should significantly improve cpu usage, and potentially
flakyness.
The use of Patch updates has been dialed back a little as
it does not look like its a 100% ready for prime time. Nodes
are now updated with full changes, except for a few things
like online status.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
We currently do not have a way to clean up api keys. There may be cases
where users of headscale may generate a lot of api keys and these may
end up accumulating in the database. This commit adds the command to
delete an api key given a prefix.
This commits removes the locks used to guard data integrity for the
database and replaces them with Transactions, turns out that SQL had
a way to deal with this all along.
This reduces the complexity we had with multiple locks that might stack
or recurse (database, nofitifer, mapper). All notifications and state
updates are now triggered _after_ a database change.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>