zitadel

mirror of https://github.com/zitadel/zitadel.git synced 2025-12-31 04:57:29 +00:00

Author	SHA1	Message	Date
Silvan	47c87b3e55	fix(projections): overhaul the event projection system (#10560 ) This PR overhauls our event projection system to make it more robust and prevent skipped events under high load. The core change replaces our custom, transaction-based locking with standard PostgreSQL advisory locks. We also introduce a worker pool to manage concurrency and prevent database connection exhaustion. ### Key Changes * Advisory Locks for Projections: Replaces exclusive row locks and inspection of `pg_stat_activity` with PostgreSQL advisory locks for managing projection state. This is a more reliable and standard approach to distributed locking. * Simplified Await Logic: Removes the complex logic for awaiting open transactions, simplifying it to a more straightforward time-based filtering of events. * Projection Worker Pool: Implements a worker pool to limit concurrent projection triggers, preventing connection exhaustion and improving stability under load. A new `MaxParallelTriggers` configuration option is introduced. ### Problem Solved Under high throughput, a race condition could cause projections to miss events from the eventstore. This led to inconsistent data in projection tables (e.g., a user grant might be missing). This PR fixes the underlying locking and concurrency issues to ensure all events are processed reliably. ### How it Works 1. Event Writing: When writing events, a shared advisory lock is taken. This signals that a write is in progress. 2. Event Handling (Projections): * A projection worker attempts to acquire an exclusive advisory lock for that specific projection. If the lock is already held, it means another worker is on the job, so the current one backs off. * Once the lock is acquired, the worker briefly acquires and releases the same shared lock used by event writers. This acts as a barrier, ensuring it waits for any in-flight writes to complete. * Finally, it processes all events that occurred before its transaction began. ### Additional Information * ZITADEL no longer modifies the `application_name` PostgreSQL variable during event writes. * The lock on the `current_states` table is now `FOR NO KEY UPDATE`. * Fixes https://github.com/zitadel/zitadel/issues/8509 --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Tim Möhlmann <tim+github@zitadel.com> (cherry picked from commit `0575f67e94`)	2025-09-15 09:50:26 +02:00
Tim Möhlmann	c4c56a25dc	perf: drop instance position index (#10626 ) # Which Problems Are Solved There was an left-behind index introduced to optimize the old and removed event execution handler. The index confuses prostgres and it sometimes picks this index in favor of the projection specific index. This sometimes leads to bad query performance in the projectio handlers. # How the Problems Are Solved Drop the index # Additional Changes - none # Additional Context - Forgotten in https://github.com/zitadel/zitadel/pull/10564 (cherry picked from commit `54554b8fb9`)	2025-09-11 15:08:10 +02:00
Livio Spring	5105cf63de	fix: cleanup information in logs (#10634 ) # Which Problems Are Solved I noticed some outdated / misleading logs when starting zitadel: - The `init-projections` were no longer in beta for a long time. - The LRU auth request cache is disabled by default, which results in the following message, which has caused confusion by customers: ```level=info msg="auth request cache disabled" error="must provide a positive size"``` # How the Problems Are Solved - Removed the beta info - Disable cache initialization if possible # Additional Changes None # Additional Context - noticed internally - backport to v4.x (cherry picked from commit `a1ad87387d`)	2025-09-11 15:03:27 +02:00
Livio Spring	f5c34a58a4	perf(actionsv2): execution target router (#10564 ) # Which Problems Are Solved The event execution system currently uses a projection handler that subscribes to and processes all events for all instances. This creates a high static cost because the system over-fetches event data, handling many events that are not needed by most instances. This inefficiency is also reflected in high "rows returned" metrics in the database. # How the Problems Are Solved Eliminate the use of a project handler. Instead, events for which "execution targets" are defined, are directly pushed to the queue by the eventstore. A Router is populated in the Instance object in the authz middleware. - By joining the execution targets to the instance, no additional queries are needed anymore. - As part of the instance object, execution targets are now cached as well. - Events are queued within the same transaction, giving transactional guarantees on delivery. - Uses the "insert many fast` variant of River. Multiple jobs are queued in a single round-trip to the database. - Fix compatibility with PostgreSQL 15 # Additional Changes - The signing key was stored as plain-text in the river job payload in the DB. This violated our [Secrets Storage](https://zitadel.com/docs/concepts/architecture/secrets#secrets-storage) principle. This change removed the field and only uses the encrypted version of the signing key. - Fixed the target ordering from descending to ascending. - Some minor linter warnings on the use of `io.WriteString()`. # Additional Context - Introduced in https://github.com/zitadel/zitadel/pull/9249 - Closes https://github.com/zitadel/zitadel/issues/10553 - Closes https://github.com/zitadel/zitadel/issues/9832 - Closes https://github.com/zitadel/zitadel/issues/10372 - Closes https://github.com/zitadel/zitadel/issues/10492 --------- Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com> (cherry picked from commit `a9ebc06c77`)	2025-09-10 07:53:53 +02:00
Tim Möhlmann	ca510c52dd	fix(oidc): enable webkey feature by default (#10683 ) # Which Problems Are Solved When the webkey feature flag was not enabled before an upgrade to v4, all JWT tokens became invalid. This created a couple of issues: - All users with JWT access tokens are logged-out - Clients that are unable to refresh keys based on key ID break - id_token_hint could no longer be validated. # How the Problems Are Solved Force-enable the webkey feature on the v3 version, so that the upgrade path is cleaner. Sessions now have time to role-over to the new keys before initiating the upgrade to v4. # Additional Changes - none # Additional Context - Related https://github.com/zitadel/zitadel/issues/10673 --------- Co-authored-by: Livio Spring <livio.a@gmail.com>	2025-09-10 07:53:29 +02:00
Tim Möhlmann	7083bcd6ad	feat: generate webkeys setup step (#10105 ) # Which Problems Are Solved We are preparing to roll-out and stabilize webkeys in the next version of Zitadel. Before removing legacy signing-key code, we must ensure all existing instances have their webkeys generated. # How the Problems Are Solved Add a setup step which generate 2 webkeys for each existing instance that didn't have webkeys yet. # Additional Changes Return an error from the config type-switch, when the type is unknown. # Additional Context - Part 1/2 of https://github.com/zitadel/zitadel/issues/10029 - Should be back-ported to v3 (cherry picked from commit `fa9de9a0f1`)	2025-08-21 09:14:17 +02:00
Livio Spring	b76d8d37cb	fix: permission checks on session API # Which Problems Are Solved The session API allowed any authenticated user to update sessions by their ID without any further check. This was unintentionally introduced with version 2.53.0 when the requirement of providing the latest session token on every session update was removed and no other permission check (e.g. session.write) was ensured. # How the Problems Are Solved - Granted `session.write` to `IAM_OWNER` and `IAM_LOGIN_CLIENT` in the defaults.yaml - Granted `session.read` to `IAM_ORG_MANAGER`, `IAM_USER_MANAGER` and `ORG_OWNER` in the defaults.yaml - Pass the session token to the UpdateSession command. - Check for `session.write` permission on session creation and update. - Alternatively, the (latest) sessionToken can be used to update the session. - Setting an auth request to failed on the OIDC Service `CreateCallback` endpoint now ensures it's either the same user as used to create the auth request (for backwards compatibilty) or requires `session.link` permission. - Setting an device auth request to failed on the OIDC Service `AuthorizeOrDenyDeviceAuthorization` endpoint now requires `session.link` permission. - Setting an auth request to failed on the SAML Service `CreateResponse` endpoint now requires `session.link` permission. # Additional Changes none # Additional Context none (cherry picked from commit `4c942f3477`)	2025-07-15 13:47:35 +02:00
Silvan	8ac4b61ee6	perf(query): reduce user query duration (#10037 ) # Which Problems Are Solved The resource usage to query user(s) on the database was high and therefore could have performance impact. # How the Problems Are Solved Database queries involving the users and loginnames table were improved and an index was added for user by email query. # Additional Changes - spellchecks - updated apis on load tests # additional info needs cherry pick to v3 (cherry picked from commit `4df138286b`)	2025-06-12 07:06:45 +02:00
Silvan	f065257e4f	fix(queue): reset projection list before each `Register` call (#10001 ) # Which Problems Are Solved if Zitadel was started using `start-from-init` or `start-from-setup` there were rare cases where a panic occured when `Notifications.LegacyEnabled` was set to false. The cause was a list which was not reset before refilling. # How the Problems Are Solved The list is now reset before each time it gets filled. # Additional Changes Ensure all contexts are canceled for the init and setup functions for `start-from-init- or `start-from-setup` commands. # Additional Context none	2025-06-02 11:41:02 +02:00
Silvan	7c5480f94e	fix(eventstore): use decimal, correct mirror (#9916 ) # Eventstore fixes - `event.Position` used float64 before which can lead to [precision loss](https://github.com/golang/go/issues/47300). The type got replaced by [a type without precision loss](https://github.com/jackc/pgx-shopspring-decimal) - the handler reported the wrong error if the current state was updated and therefore took longer to retry failed events. # Mirror fixes - max age of auth requests can be configured to speed up copying data from `auth.auth_requests` table. Auth requests last updated before the set age will be ignored. Default is 1 month - notification projections are skipped because notifications should be sent by the source system. The projections are set to the latest position - ensure that mirror can be executed multiple times	2025-05-27 17:13:17 +02:00
Livio Spring	3c99cf82f8	feat: federated logout for SAML IdPs (#9931 ) # Which Problems Are Solved Currently if a user signs in using an IdP, once they sign out of Zitadel, the corresponding IdP session is not terminated. This can be the desired behavior. In some cases, e.g. when using a shared computer it results in a potential security risk, since a follower user might be able to sign in as the previous using the still open IdP session. # How the Problems Are Solved - Admins can enabled a federated logout option on SAML IdPs through the Admin and Management APIs. - During the termination of a login V1 session using OIDC end_session endpoint, Zitadel will check if an IdP was used to authenticate that session. - In case there was a SAML IdP used with Federated Logout enabled, it will intercept the logout process, store the information into the shared cache and redirect to the federated logout endpoint in the V1 login. - The V1 login federated logout endpoint checks every request on an existing cache entry. On success it will create a SAML logout request for the used IdP and either redirect or POST to the configured SLO endpoint. The cache entry is updated with a `redirected` state. - A SLO endpoint is added to the `/idp` handlers, which will handle the SAML logout responses. At the moment it will check again for an existing federated logout entry (with state `redirected`) in the cache. On success, the user is redirected to the initially provided `post_logout_redirect_uri` from the end_session request. # Additional Changes None # Additional Context - This PR merges the https://github.com/zitadel/zitadel/pull/9841 and https://github.com/zitadel/zitadel/pull/9854 to main, additionally updating the docs on Entra ID SAML. - closes #9228 - backport to 3.x --------- Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com> Co-authored-by: Zach Hirschtritt <zachary.hirschtritt@klaviyo.com> (cherry picked from commit `2cf3ef4de4`)	2025-05-23 14:59:34 +02:00
Juriaan Kennedy	603799f409	feat(crypto): support for SHA2 and PHPass password hashes (#9809 ) # Which Problems Are Solved - Allow users to use SHA-256 and SHA-512 hashing algorithms. These algorithms are used by Linux's crypt(3) function. - Allow users to import passwords using the PHPass algorithm. This algorithm is used by older PHP systems, WordPress in particular. # How the Problems Are Solved - Upgrade passwap to [v0.9.0](https://github.com/zitadel/passwap/releases/tag/v0.9.0) - Add sha2 and phpass as a new verifier option in defaults.yaml # Additional Changes - Updated docs to explain the two algorithms # Additional Context Implements the changes in the passwap library from https://github.com/zitadel/passwap/pull/59 and https://github.com/zitadel/passwap/pull/60 (cherry picked from commit `38013d0e84`)	2025-05-21 14:15:46 +02:00
Silvan	d4498ad136	fix(setup): reenable index creation (#9868 ) # Which Problems Are Solved We saw high CPU usage if many events were created on the database. This was caused by the new actions which query for all event types and aggregate types. # How the Problems Are Solved - the handler of action execution does not filter for aggregate and event types. - the index for `instance_id` and `position` is reenabled. # Additional Changes none # Additional Context none (cherry picked from commit `60ce32ca4f`)	2025-05-09 11:41:06 +02:00
Livio Spring	b5d14bafce	fix: remove index es_instance_position (#9862 ) # Which Problems Are Solved #9837 added a new index `es_instance_position` on the events table with the idea to improve performance for some projections. Unfortunately, it makes it worse for almost all projections and would only improve the situation for the events handler of the actions V2 subscriptions. # How the Problems Are Solved Remove the index again. # Additional Changes None # Additional Context relates to #9837 relates to #9863 (cherry picked from commit `d71795c433`)	2025-05-08 08:36:28 +02:00
Stefan Benz	c877add363	fix: add current state for execution handler into setup (#9863 ) # Which Problems Are Solved The execution handler projection handles all events to check if an execution has to be provided to the worker to execute. In this logic all events would be processed from the beginning which is not necessary. # How the Problems Are Solved Add the current state to the execution handler projection, to avoid processing all existing events. # Additional Changes Add custom configuration to the default, so that the transactions are limited to some events. # Additional Context None (cherry picked from commit `21167a4bba`)	2025-05-07 16:30:09 +02:00
Silvan	d4ec519af6	fix(setup): execute s54 (#9849 ) # Which Problems Are Solved Step 54 was not executed during setup. # How the Problems Are Solved Added the step to setup jobs # Additional Changes none # Additional Context - the step was added in https://github.com/zitadel/zitadel/pull/9837 - thanks to @zhirschtritt for raising this. (cherry picked from commit `a626678004`)	2025-05-06 08:25:18 +02:00
Livio Spring	573c96d6af	fix merge	2025-05-02 13:59:37 +02:00
Livio Spring	4c5769355b	fix: prevent intent token reuse and add expiry (cherry picked from commit `b1e60e7398`)	2025-05-02 13:52:28 +02:00
Tim Möhlmann	5e48ee2c15	perf(eventstore): add instance position index (#9837 ) # Which Problems Are Solved Some projection queries took a long time to run. It seems that 1 or more queries couldn't make proper use of the `es_projection` index. This might be because of a specific complexity aggregate_type and event_type arguments, making the index unfeasible for postgres. # How the Problems Are Solved Following the index recommendation, add and index that covers just instance_id and position. # Additional Changes - none # Additional Context - Related to https://github.com/zitadel/zitadel/issues/9832 (cherry picked from commit `bb56b362a7`)	2025-05-02 13:51:59 +02:00
Livio Spring	483a2c1122	Merge branch 'next-rc' into next # Conflicts: # build/workflow.Dockerfile # cmd/setup/config.go # cmd/setup/setup.go # console/package.json # console/src/app/services/grpc.service.ts # console/yarn.lock # deploy/knative/cockroachdb-statefulset-single-node.yaml # e2e/config/localhost/docker-compose.yaml # go.mod # go.sum # internal/command/oidc_session_test.go # internal/query/idp_template_test.go	2025-04-30 16:41:49 +02:00
Tim Möhlmann	98d5e97ad4	fix(features): remove the improved performance enumer (#9819 ) # Which Problems Are Solved Instance that had improved performance flags set, got event errors when getting instance features. This is because the improved performance flags were marshalled using the enumerated integers, but now needed to be unmashalled using the added UnmarshallText method. # How the Problems Are Solved - Remove emnumer generation # Additional Changes - none # Additional Context - reported on QA - Backport to next-rc / v3 (cherry picked from commit `0465d5093e`)	2025-04-30 15:23:26 +02:00
Silvan	82e232af72	fix(mirror): add max auth request age configuration (#9812 ) # Which Problems Are Solved The `auth.auth_requests` table is not cleaned up so long running Zitadel installations can contain many rows. The mirror command can take long because a the data are first copied into memory (or disk) on cockroach and users do not get any output from mirror. This is unfortunate because people don't know if Zitadel got stuck. # How the Problems Are Solved Enhance logging throughout the projection processes and introduce a configuration option for the maximum age of authentication requests. # Additional Changes None # Additional Context closes https://github.com/zitadel/zitadel/issues/9764 --------- Co-authored-by: Livio Spring <livio.a@gmail.com> (cherry picked from commit `181186e477`)	2025-04-30 15:23:23 +02:00
Tim Möhlmann	19aacdab26	fix(instance): add web key generation to instance defaults (#9815 ) # Which Problems Are Solved Webkeys were not generated with new instances when the webkey feature flag was enabled for instance defaults. This would cause a redirect loop with console for new instances on QA / coud. # How the Problems Are Solved - uncomment the webkeys section on defaults.yaml - Fix field naming of webkey config # Additional Changes - Add all available features as comments. - Make the improved performance type enum parsable from the config, untill now they were just ints. - Running of the enumer command created missing enum entries for feature keys. # Additional Context - Needs to be back-ported to v3 / next-rc Co-authored-by: Livio Spring <livio.a@gmail.com> (cherry picked from commit `91bc71db74`)	2025-04-30 15:23:17 +02:00
Livio Spring	cd19b264b3	fix(actions): handle empty deny list correctly (#9753 ) <!-- Please inform yourself about the contribution guidelines on submitting a PR here: https://github.com/zitadel/zitadel/blob/main/CONTRIBUTING.md#submit-a-pull-request-pr. Take note of how PR/commit titles should be written and replace the template texts in the sections below. Don't remove any of the sections. It is important that the commit history clearly shows what is changed and why. Important: By submitting a contribution you agree to the terms from our Licensing Policy as described here: https://github.com/zitadel/zitadel/blob/main/LICENSING.md#community-contributions. --> # Which Problems Are Solved A customer reached out that after an upgrade, actions would always fail with the error "host is denied" when calling an external API. This is due to a security fix (https://github.com/zitadel/zitadel/security/advisories/GHSA-6cf5-w9h3-4rqv), where a DNS lookup was added to check whether the host name resolves to a denied IP or subnet. If the lookup fails due to the internal DNS setup, the action fails as well. Additionally, the lookup was also performed when the deny list was empty. # How the Problems Are Solved - Prevent DNS lookup when deny list is empty - Properly initiate deny list and prevent empty entries # Additional Changes - Log the reason for blocked address (domain, IP, subnet) # Additional Context - reported by a customer - needs backport to 2.70.x, 2.71.x and 3.0.0 rc (cherry picked from commit `4ffd4ef381`)	2025-04-29 13:05:14 +02:00
Zach Hirschtritt	7fceb5eaf8	fix: Auto cleanup failed Setup steps if process is killed (#9736 ) # Which Problems Are Solved When running a long-running Zitadel Setup, Kubernetes might decide to move a pod to a new node automatically. Currently, this puts any migrations into a broken state that an operator needs to manually run the "cleanup" command on - assuming they catch the error. The only super long running commands are typically projection pre-fill operations, which depending on the size of the event table for that projection, can take many hours - plenty of time for Kubernetes to make unexpected decisions, especially in a busy cluster. # How the Problems Are Solved This change listens on `os.Interrupt` and `syscall.SIGTERM`, cancels the current Setup context, and runs the `Cleanup` command. The logs then look something like this: ```shell ... INFO[0000] verify migration caller="/Users/zach/src/zitadel/internal/migration/migration.go:43" name=repeatable_delete_stale_org_fields INFO[0000] starting migration caller="/Users/zach/src/zitadel/internal/migration/migration.go:66" name=repeatable_delete_stale_org_fields INFO[0000] execute delete query caller="/Users/zach/src/zitadel/cmd/setup/39.go:37" instance_id=281297936179003398 migration=repeatable_delete_stale_org_fields progress=1/1 INFO[0000] verify migration caller="/Users/zach/src/zitadel/internal/migration/migration.go:43" name=repeatable_fill_fields_for_instance_domains INFO[0000] starting migration caller="/Users/zach/src/zitadel/internal/migration/migration.go:66" name=repeatable_fill_fields_for_instance_domains ----- SIGTERM signal issued ----- INFO[0000] received interrupt signal, shutting down: interrupt caller="/Users/zach/src/zitadel/cmd/setup/setup.go:121" INFO[0000] query failed caller="/Users/zach/src/zitadel/internal/eventstore/repository/sql/query.go:135" error="timeout: context already done: context canceled" DEBU[0000] filter eventstore failed caller="/Users/zach/src/zitadel/internal/eventstore/handler/v2/field_handler.go:155" error="ID=SQL-KyeAx Message=unable to filter events Parent=(timeout: context already done: context canceled)" projection=instance_domain_fields DEBU[0000] unable to rollback tx caller="/Users/zach/src/zitadel/internal/eventstore/handler/v2/field_handler.go:110" error="sql: transaction has already been committed or rolled back" projection=instance_domain_fields INFO[0000] process events failed caller="/Users/zach/src/zitadel/internal/eventstore/handler/v2/field_handler.go:72" error="ID=SQL-KyeAx Message=unable to filter events Parent=(timeout: context already done: context canceled)" projection=instance_domain_fields DEBU[0000] trigger iteration caller="/Users/zach/src/zitadel/internal/eventstore/handler/v2/field_handler.go:73" iteration=0 projection=instance_domain_fields ERRO[0000] migration failed caller="/Users/zach/src/zitadel/internal/migration/migration.go:68" error="ID=SQL-KyeAx Message=unable to filter events Parent=(timeout: context already done: context canceled)" name=repeatable_fill_fields_for_instance_domains ERRO[0000] migration finish failed caller="/Users/zach/src/zitadel/internal/migration/migration.go:71" error="context canceled" name=repeatable_fill_fields_for_instance_domains ----- Cleanup before exiting ----- INFO[0000] cleanup started caller="/Users/zach/src/zitadel/cmd/setup/cleanup.go:30" INFO[0000] cleanup migration caller="/Users/zach/src/zitadel/cmd/setup/cleanup.go:47" name=repeatable_fill_fields_for_instance_domains ``` # Additional Changes * `mustExecuteMigration` -> `executeMigration`: mustExecute logged a Fatal error previously which calls os.Exit so no cleanup was possible. Instead, this PR returns an error and assigns it to a shared error in the Setup closure that defer can check. * `initProjections` now returns an error instead of exiting # Additional Context This behavior might be unwelcome or at least unexpected in some cases. Putting it behind a feature flag or config setting is likely a good followup. --------- Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com> (cherry picked from commit `aa9ef8b49e`)	2025-04-29 13:04:53 +02:00
Livio Spring	bb52896ddf	fix(actions): handle empty deny list correctly (#9753 ) <!-- Please inform yourself about the contribution guidelines on submitting a PR here: https://github.com/zitadel/zitadel/blob/main/CONTRIBUTING.md#submit-a-pull-request-pr. Take note of how PR/commit titles should be written and replace the template texts in the sections below. Don't remove any of the sections. It is important that the commit history clearly shows what is changed and why. Important: By submitting a contribution you agree to the terms from our Licensing Policy as described here: https://github.com/zitadel/zitadel/blob/main/LICENSING.md#community-contributions. --> # Which Problems Are Solved A customer reached out that after an upgrade, actions would always fail with the error "host is denied" when calling an external API. This is due to a security fix (https://github.com/zitadel/zitadel/security/advisories/GHSA-6cf5-w9h3-4rqv), where a DNS lookup was added to check whether the host name resolves to a denied IP or subnet. If the lookup fails due to the internal DNS setup, the action fails as well. Additionally, the lookup was also performed when the deny list was empty. # How the Problems Are Solved - Prevent DNS lookup when deny list is empty - Properly initiate deny list and prevent empty entries # Additional Changes - Log the reason for blocked address (domain, IP, subnet) # Additional Context - reported by a customer - needs backport to 2.70.x, 2.71.x and 3.0.0 rc (cherry picked from commit `4ffd4ef381`)	2025-04-25 09:41:30 +02:00
Zach Hirschtritt	da22677675	fix: Auto cleanup failed Setup steps if process is killed (#9736 ) # Which Problems Are Solved When running a long-running Zitadel Setup, Kubernetes might decide to move a pod to a new node automatically. Currently, this puts any migrations into a broken state that an operator needs to manually run the "cleanup" command on - assuming they catch the error. The only super long running commands are typically projection pre-fill operations, which depending on the size of the event table for that projection, can take many hours - plenty of time for Kubernetes to make unexpected decisions, especially in a busy cluster. # How the Problems Are Solved This change listens on `os.Interrupt` and `syscall.SIGTERM`, cancels the current Setup context, and runs the `Cleanup` command. The logs then look something like this: ```shell ... INFO[0000] verify migration caller="/Users/zach/src/zitadel/internal/migration/migration.go:43" name=repeatable_delete_stale_org_fields INFO[0000] starting migration caller="/Users/zach/src/zitadel/internal/migration/migration.go:66" name=repeatable_delete_stale_org_fields INFO[0000] execute delete query caller="/Users/zach/src/zitadel/cmd/setup/39.go:37" instance_id=281297936179003398 migration=repeatable_delete_stale_org_fields progress=1/1 INFO[0000] verify migration caller="/Users/zach/src/zitadel/internal/migration/migration.go:43" name=repeatable_fill_fields_for_instance_domains INFO[0000] starting migration caller="/Users/zach/src/zitadel/internal/migration/migration.go:66" name=repeatable_fill_fields_for_instance_domains ----- SIGTERM signal issued ----- INFO[0000] received interrupt signal, shutting down: interrupt caller="/Users/zach/src/zitadel/cmd/setup/setup.go:121" INFO[0000] query failed caller="/Users/zach/src/zitadel/internal/eventstore/repository/sql/query.go:135" error="timeout: context already done: context canceled" DEBU[0000] filter eventstore failed caller="/Users/zach/src/zitadel/internal/eventstore/handler/v2/field_handler.go:155" error="ID=SQL-KyeAx Message=unable to filter events Parent=(timeout: context already done: context canceled)" projection=instance_domain_fields DEBU[0000] unable to rollback tx caller="/Users/zach/src/zitadel/internal/eventstore/handler/v2/field_handler.go:110" error="sql: transaction has already been committed or rolled back" projection=instance_domain_fields INFO[0000] process events failed caller="/Users/zach/src/zitadel/internal/eventstore/handler/v2/field_handler.go:72" error="ID=SQL-KyeAx Message=unable to filter events Parent=(timeout: context already done: context canceled)" projection=instance_domain_fields DEBU[0000] trigger iteration caller="/Users/zach/src/zitadel/internal/eventstore/handler/v2/field_handler.go:73" iteration=0 projection=instance_domain_fields ERRO[0000] migration failed caller="/Users/zach/src/zitadel/internal/migration/migration.go:68" error="ID=SQL-KyeAx Message=unable to filter events Parent=(timeout: context already done: context canceled)" name=repeatable_fill_fields_for_instance_domains ERRO[0000] migration finish failed caller="/Users/zach/src/zitadel/internal/migration/migration.go:71" error="context canceled" name=repeatable_fill_fields_for_instance_domains ----- Cleanup before exiting ----- INFO[0000] cleanup started caller="/Users/zach/src/zitadel/cmd/setup/cleanup.go:30" INFO[0000] cleanup migration caller="/Users/zach/src/zitadel/cmd/setup/cleanup.go:47" name=repeatable_fill_fields_for_instance_domains ``` # Additional Changes * `mustExecuteMigration` -> `executeMigration`: mustExecute logged a Fatal error previously which calls os.Exit so no cleanup was possible. Instead, this PR returns an error and assigns it to a shared error in the Setup closure that defer can check. * `initProjections` now returns an error instead of exiting # Additional Context This behavior might be unwelcome or at least unexpected in some cases. Putting it behind a feature flag or config setting is likely a good followup. --------- Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com> (cherry picked from commit `aa9ef8b49e`)	2025-04-25 08:48:04 +02:00
Livio Spring	29890087ba	fix(mirror): initialize meter to prevent panic (#9712 ) # Which Problems Are Solved With the change of #9561, the `mirror` command panics as there's no metrics provider configured. # How the Problems Are Solved Correctly initialize the provider (no-op by default) for the mirror command. # Additional Changes None # Additional Context relates to #9561 -> needs backports to 2.66.x - 2.71.x and 3.0.0-rc Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>	2025-04-09 13:34:30 +02:00
Livio Spring	a7a5124643	fix(mirror): initialize meter to prevent panic (#9712 ) # Which Problems Are Solved With the change of #9561, the `mirror` command panics as there's no metrics provider configured. # How the Problems Are Solved Correctly initialize the provider (no-op by default) for the mirror command. # Additional Changes None # Additional Context relates to #9561 -> needs backports to 2.66.x - 2.71.x and 3.0.0-rc Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>	2025-04-07 16:00:15 +00:00
Fabienne Bühler	07ce3b6905	chore!: Introduce ZITADEL v3 (#9645 ) This PR summarizes multiple changes specifically only available with ZITADEL v3: - feat: Web Keys management (https://github.com/zitadel/zitadel/pull/9526) - fix(cmd): ensure proper working of mirror (https://github.com/zitadel/zitadel/pull/9509) - feat(Authz): system user support for permission check v2 (https://github.com/zitadel/zitadel/pull/9640) - chore(license): change from Apache to AGPL (https://github.com/zitadel/zitadel/pull/9597) - feat(console): list v2 sessions (https://github.com/zitadel/zitadel/pull/9539) - fix(console): add loginV2 feature flag (https://github.com/zitadel/zitadel/pull/9682) - fix(feature flags): allow reading "own" flags (https://github.com/zitadel/zitadel/pull/9649) - feat(console): add Actions V2 UI (https://github.com/zitadel/zitadel/pull/9591) BREAKING CHANGE - feat(webkey): migrate to v2beta API (https://github.com/zitadel/zitadel/pull/9445) - chore!: remove CockroachDB Support (https://github.com/zitadel/zitadel/pull/9444) - feat(actions): migrate to v2beta API (https://github.com/zitadel/zitadel/pull/9489) --------- Co-authored-by: Livio Spring <livio.a@gmail.com> Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com> Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com> Co-authored-by: Ramon <mail@conblem.me> Co-authored-by: Elio Bischof <elio@zitadel.com> Co-authored-by: Kenta Yamaguchi <56732734+KEY60228@users.noreply.github.com> Co-authored-by: Harsha Reddy <harsha.reddy@klaviyo.com> Co-authored-by: Livio Spring <livio@zitadel.com> Co-authored-by: Max Peintner <max@caos.ch> Co-authored-by: Iraq <66622793+kkrime@users.noreply.github.com> Co-authored-by: Florian Forster <florian@zitadel.com> Co-authored-by: Tim Möhlmann <tim+github@zitadel.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Max Peintner <peintnerm@gmail.com>	2025-04-02 16:53:06 +02:00
Stefan Benz	11facd7e6f	fix(migration): check if ldap2 already exists (#9674 ) # Which Problems Are Solved With v2.71.0 the `idp_templates6_ldap3` projection was created but never filled, as it was a subtable. To fix this we altered the `idp_templates6_ldap3` to `idp_templates6_ldap2` with v2.71.5. This was unfortunately without a check that the `idp_templates_ldap2`was already existing, which resulted in an error in the setup step. # How the Problems Are Solved Add check if `idp_templates6_ldap2` is already existing, before renaming `idp_templates6_ldap3` -> `idp_templates6_ldap2`. # Additional Changes None # Additional Context Closes #9669 (cherry picked from commit `2eb187f141`)	2025-03-31 12:46:54 +02:00
Stefan Benz	2eb187f141	fix(migration): check if ldap2 already exists (#9674 ) # Which Problems Are Solved With v2.71.0 the `idp_templates6_ldap3` projection was created but never filled, as it was a subtable. To fix this we altered the `idp_templates6_ldap3` to `idp_templates6_ldap2` with v2.71.5. This was unfortunately without a check that the `idp_templates_ldap2`was already existing, which resulted in an error in the setup step. # How the Problems Are Solved Add check if `idp_templates6_ldap2` is already existing, before renaming `idp_templates6_ldap3` -> `idp_templates6_ldap2`. # Additional Changes None # Additional Context Closes #9669	2025-03-31 10:06:40 +00:00
Zach Hirschtritt	61ddecee31	fix: add prometheus metrics on projection handlers (#9561 ) # Which Problems Are Solved With current provided telemetry it's difficult to predict when a projection handler is under increased load until it's too late and causes downstream issues. Importantly, projection updating is in the critical path for many login flows and increased latency there can result in system downtime for users. # How the Problems Are Solved This PR adds three new prometheus-style metrics: 1. projection_events_processed (_labels: projection, success_) - This metric gives us a counter of the number of events processed per projection update run and whether they we're processed without error. A high number of events being processed can let us know how busy a particular projection handler is. 2. projection_handle_timer _(labels: projection)_ - This is the time it takes to process a projection update given a batch of events - time to take the current_states lock, query for new events, reduce, update_the projection, and update current_states. 3. projection_state_latency _(labels: projection)_ - This is the time from the last event processed in the current_states table for a given projection. It tells us how old was the last event you processed? Or, how far behind are you running for this projection? Higher latencies could mean high load or stalled projection handling. # Additional Changes I also had to initialize the global otel metrics provider (`metrics.M`) in the `setup` step additionally to `start` since projection handlers are initialized at setup. The initialization checks if a metrics provider is already set (in case of `start-from-setup` or `start-from-init` to prevent overwriting, which causes the otel metrics provider to stop working. # Additional Context ## Example Dashboards ![image](https://github.com/user-attachments/assets/94ba5c2b-9c62-44cd-83ee-4db4a8859073) ![image](https://github.com/user-attachments/assets/60a1b406-a8c6-48dc-a925-575359f97e1e) --------- Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com> Co-authored-by: Livio Spring <livio.a@gmail.com> (cherry picked from commit `c1535b7b49`)	2025-03-28 07:58:47 +01:00
Stefan Benz	12b78e5a36	fix: rename idp_templates6_ldap3 to ldap2 if necessary (#9565 ) # Which Problems Are Solved Zitadel setup with v2.71.0 could result in errors regarding the idp_templates6_ldap3 subtable. # How the Problems Are Solved Rename the subtable idp_templates6_ldap3 to idp_templates6_ldap2 if no idp_templates6_ldap2 is existing and rename column `rootCA` to `root_ca`. # Additional Changes None # Additional Context Related PR #9292 --------- Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com> (cherry picked from commit `6b23c33cb6`)	2025-03-28 07:58:18 +01:00
Harsha Reddy	113a4ed817	fix: Make service name configurable for Metrics and Tracing (#9563 ) # Which Problems Are Solved The service name is hardcoded in the metrics code. Making the service name to be configurable helps when running multiple instances of Zitadel. The defaults remain unchanged, the service name will be defaulted to ZITADEL. # How the Problems Are Solved Add a config option to override the name in defaults.yaml and pass it down to the corresponding metrics or tracing module (google or otel) # Additional Changes NA # Additional Context NA (cherry picked from commit `dc64e35128`)	2025-03-28 07:37:19 +01:00
Zach Hirschtritt	c1535b7b49	feat: add prometheus metrics on projection handlers (#9561 ) # Which Problems Are Solved With current provided telemetry it's difficult to predict when a projection handler is under increased load until it's too late and causes downstream issues. Importantly, projection updating is in the critical path for many login flows and increased latency there can result in system downtime for users. # How the Problems Are Solved This PR adds three new prometheus-style metrics: 1. projection_events_processed (_labels: projection, success_) - This metric gives us a counter of the number of events processed per projection update run and whether they we're processed without error. A high number of events being processed can let us know how busy a particular projection handler is. 2. projection_handle_timer _(labels: projection)_ - This is the time it takes to process a projection update given a batch of events - time to take the current_states lock, query for new events, reduce, update_the projection, and update current_states. 3. projection_state_latency _(labels: projection)_ - This is the time from the last event processed in the current_states table for a given projection. It tells us how old was the last event you processed? Or, how far behind are you running for this projection? Higher latencies could mean high load or stalled projection handling. # Additional Changes I also had to initialize the global otel metrics provider (`metrics.M`) in the `setup` step additionally to `start` since projection handlers are initialized at setup. The initialization checks if a metrics provider is already set (in case of `start-from-setup` or `start-from-init` to prevent overwriting, which causes the otel metrics provider to stop working. # Additional Context ## Example Dashboards ![image](https://github.com/user-attachments/assets/94ba5c2b-9c62-44cd-83ee-4db4a8859073) ![image](https://github.com/user-attachments/assets/60a1b406-a8c6-48dc-a925-575359f97e1e) --------- Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com> Co-authored-by: Livio Spring <livio.a@gmail.com>	2025-03-27 07:40:27 +01:00
Stefan Benz	6b23c33cb6	fix: rename idp_templates6_ldap3 to ldap2 if necessary (#9565 ) # Which Problems Are Solved Zitadel setup with v2.71.0 could result in errors regarding the idp_templates6_ldap3 subtable. # How the Problems Are Solved Rename the subtable idp_templates6_ldap3 to idp_templates6_ldap2 if no idp_templates6_ldap2 is existing and rename column `rootCA` to `root_ca`. # Additional Changes None # Additional Context Related PR #9292 --------- Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>	2025-03-26 19:26:16 +00:00
Miro Trisc	e4c12864e5	feat(crypto): support md5 salted for imported password hashes (#9596 ) # Which Problems Are Solved Allow verification of imported salted passwords hashed with plain md5. # How the Problems Are Solved - Upgrade passwap to [v0.7.0](https://github.com/zitadel/passwap/releases/tag/v0.7.0) - Add md5salted as a new verifier option in `defaults.yaml` # Additional Changes - go version and libraries updated (required by passkey v0.7.0) - secrets.md verifiers updated - configuration verifiers updated - added MD5salted and missing MD5Plain to test cases	2025-03-21 12:25:52 +00:00
Harsha Reddy	dc64e35128	feat: Make service name configurable for Metrics and Tracing (#9563 ) # Which Problems Are Solved The service name is hardcoded in the metrics code. Making the service name to be configurable helps when running multiple instances of Zitadel. The defaults remain unchanged, the service name will be defaulted to ZITADEL. # How the Problems Are Solved Add a config option to override the name in defaults.yaml and pass it down to the corresponding metrics or tracing module (google or otel) # Additional Changes NA # Additional Context NA	2025-03-20 09:35:54 +00:00
Iraq	9f0da00cd5	fix: manage root CA for LDAP IdPs correctly (#9517 ) # Which Problems Are Solved #9292 did not correctly change the projection table to list IdPs for existing ZITADEL setups. # How the Problems Are Solved Fixed the projection table by an explicit setup. # Additional Changes To prevent user facing error when using the LDAP with a custom root CA as much as possible, the certificate is parsed when passing it to the API. # Additional Context - Closes https://github.com/zitadel/zitadel/issues/9514 --------- Co-authored-by: Iraq Jaber <IraqJaber@gmail.com> (cherry picked from commit `11c9be3b8d`)	2025-03-18 16:38:22 +01:00
Iraq	11c9be3b8d	chore: updating projections.idp_templates6 to projections.idp_templates7 (#9517 ) # Which Problems Are Solved This was left out as part of https://github.com/zitadel/zitadel/pull/9292 - Closes https://github.com/zitadel/zitadel/issues/9514 --------- Co-authored-by: Iraq Jaber <IraqJaber@gmail.com>	2025-03-18 16:23:12 +01:00
Silvan	e82e53bd45	fix(cmd): clarify notification config handling (#9459 ) # Which Problems Are Solved If configuration `notifications.LegacyEnabled` is set to false when using cockroachdb as a database Zitadel start does not work and prints the following error: `level=fatal msg="unable to start zitadel" caller="github.com/zitadel/zitadel/cmd/start/start_from_init.go:44" error="can't scan into dest[0]: cannot scan NULL into *string"` # How the Problems Are Solved The combination of the setting and cockraochdb are checked and a better error is provided to the user. # Additional Context - introduced with https://github.com/zitadel/zitadel/pull/9321 (cherry picked from commit `92f0cf018f`)	2025-03-06 07:41:17 +01:00
Iraq	1c121ec230	fix(permission): sql error in cmd/setup/49/01-permitted_orgs_function.sql (#9465 ) # Which Problems Are Solved SQL error in `cmd/setup/49/01-permitted_orgs_function.sql` # How the Problems Are Solved Updating `cmd/setup/49/01-permitted_orgs_function.sql` # Additional Context - Closes https://github.com/zitadel/zitadel/issues/9461 Co-authored-by: Iraq Jaber <IraqJaber@gmail.com> (cherry picked from commit `3c57e325f7`)	2025-03-06 07:41:17 +01:00
Silvan	92f0cf018f	fix(cmd): clarify notification config handling (#9459 ) # Which Problems Are Solved If configuration `notifications.LegacyEnabled` is set to false when using cockroachdb as a database Zitadel start does not work and prints the following error: `level=fatal msg="unable to start zitadel" caller="github.com/zitadel/zitadel/cmd/start/start_from_init.go:44" error="can't scan into dest[0]: cannot scan NULL into *string"` # How the Problems Are Solved The combination of the setting and cockraochdb are checked and a better error is provided to the user. # Additional Context - introduced with https://github.com/zitadel/zitadel/pull/9321	2025-03-06 06:26:33 +00:00
Iraq	3c57e325f7	fix(permission): sql error in cmd/setup/49/01-permitted_orgs_function.sql (#9465 ) # Which Problems Are Solved SQL error in `cmd/setup/49/01-permitted_orgs_function.sql` # How the Problems Are Solved Updating `cmd/setup/49/01-permitted_orgs_function.sql` # Additional Context - Closes https://github.com/zitadel/zitadel/issues/9461 Co-authored-by: Iraq Jaber <IraqJaber@gmail.com>	2025-03-05 21:48:20 +00:00
Stefan Benz	0c87a96e2c	feat: actions v2 for functions (#9420 ) # Which Problems Are Solved Actions v2 are not executed in different functions, as provided by the actions v1. # How the Problems Are Solved Add functionality to call actions v2 through OIDC and SAML logic to complement tokens and SAMLResponses. # Additional Changes - Corrected testing for retrieved intent information - Added testing for IDP types - Corrected handling of context for issuer in SAML logic # Additional Context - Closes #7247 - Dependent on https://github.com/zitadel/saml/pull/97 - docs for migration are done in separate issue: https://github.com/zitadel/zitadel/issues/9456 --------- Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>	2025-03-04 11:09:30 +00:00
Fabienne Bühler	a5bc68fdad	fix: add session roles to iam owner (#9413 ) # Which Problems Are Solved Currently I am not able to run the new login with a service account with an IAM_OWNER role. As the role is missing some permissions which the LOGIN_CLIENT role does have # How the Problems Are Solved Added session permissions to the IAM_OWNER --------- Co-authored-by: Livio Spring <livio.a@gmail.com>	2025-03-04 06:41:06 +00:00
Silvan	444f682e25	refactor(notification): use new queue package (#9360 ) # Which Problems Are Solved The recently introduced notification queue have potential race conditions. # How the Problems Are Solved Current code is refactored to use the queue package, which is safe in regards of concurrency. # Additional Changes - the queue is included in startup - improved code quality of queue # Additional Context - closes https://github.com/zitadel/zitadel/issues/9278	2025-02-27 11:49:12 +01:00
Tim Möhlmann	e670b9126c	fix(permissions): chunked synchronization of role permission events (#9403 ) # Which Problems Are Solved Setup fails to push all role permission events when running Zitadel with CockroachDB. `TransactionRetryError`s were visible in logs which finally times out the setup job with `timeout: context deadline exceeded` # How the Problems Are Solved As suggested in the [Cockroach documentation](timeout: context deadline exceeded), _"break down larger transactions"_. The commands to be pushed for the role permissions are chunked in 50 events per push. This chunking is only done with CockroachDB. # Additional Changes - gci run fixed some unrelated imports - access to `command.Commands` for the setup job, so we can reuse the sync logic. # Additional Context Closes #9293 --------- Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>	2025-02-26 16:06:50 +00:00
Livio Spring	8f88c4cf5b	feat: add PKCE option to generic OAuth2 / OIDC identity providers (#9373 ) # Which Problems Are Solved Some OAuth2 and OIDC providers require the use of PKCE for all their clients. While ZITADEL already recommended the same for its clients, it did not yet support the option on the IdP configuration. # How the Problems Are Solved - A new boolean `use_pkce` is added to the add/update generic OAuth/OIDC endpoints. - A new checkbox is added to the generic OAuth and OIDC provider templates. - The `rp.WithPKCE` option is added to the provider if the use of PKCE has been set. - The `rp.WithCodeChallenge` and `rp.WithCodeVerifier` options are added to the OIDC/Auth BeginAuth and CodeExchange function. - Store verifier or any other persistent argument in the intent or auth request. - Create corresponding session object before creating the intent, to be able to store the information. - (refactored session structs to use a constructor for unified creation and better overview of actual usage) Here's a screenshot showing the URI including the PKCE params: ![use_pkce_in_url](https://github.com/zitadel/zitadel/assets/30386061/eaeab123-a5da-4826-b001-2ae9efa35169) # Additional Changes None. # Additional Context - Closes #6449 - This PR replaces the existing PR (#8228) of @doncicuto. The base he did was cherry picked. Thank you very much for that! --------- Co-authored-by: Miguel Cabrerizo <doncicuto@gmail.com> Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>	2025-02-26 12:20:47 +00:00

1 2 3 4 5 ...

591 Commits