# Which Problems Are Solved
When Postgres was not ready when the API was started, the API failed
immediately.
This made task orchestration hard, especially in a platform agnostic
way:
- The current health check in the Nx target `@zitadel/api:prod` uses the
timeout command, which is not installed on all platforms and behaves
unpredictably
- The current health check in the Nx target `@zitadel/api:prod` requires
the DB to have been started using `@zitadel/zitadel:db`
# How the Problems Are Solved
- Additional configuration option `Database.Postgres.AwaitInitialConn`
is added and defaults to *0m* for backwards compatibility.
- If a duration is configured, the API retries to ping the database
until it succeeds
- The API sleeps for a second between each ping.
- It emits an info-level log with the error on each try.
- When the configured duration times out before the ping is successful,
the error is returned and the command exits with a failure code.
- When the ping succeeds within the configured duration, the API goes on
with the init, setup or start phase.
# Additional Context
- Relates to internally reported problems with the current DB health
check command
[here](https://zitadel.slack.com/archives/C07EUL5H83A/p1759915009839269?thread_ts=1759912259.410789&cid=C07EUL5H83A)
and
[here](https://zitadel.slack.com/archives/C07EUL5H83A/p1759918324246249?thread_ts=1759912259.410789&cid=C07EUL5H83A).
(cherry picked from commit 7ba6870baf)
# Which Problems Are Solved
The current service ping reports can run into body size limit errors and
there's no way of knowing how big the current size is.
# How the Problems Are Solved
Log the current size to have at least some insights and possibly change
bulk size.
# Additional Changes
None
# Additional Context
- noticed internally
- backport to v4.x
(cherry picked from commit bc471b4f78)
This PR overhauls our event projection system to make it more robust and
prevent skipped events under high load. The core change replaces our
custom, transaction-based locking with standard PostgreSQL advisory
locks. We also introduce a worker pool to manage concurrency and prevent
database connection exhaustion.
### Key Changes
* **Advisory Locks for Projections:** Replaces exclusive row locks and
inspection of `pg_stat_activity` with PostgreSQL advisory locks for
managing projection state. This is a more reliable and standard approach
to distributed locking.
* **Simplified Await Logic:** Removes the complex logic for awaiting
open transactions, simplifying it to a more straightforward time-based
filtering of events.
* **Projection Worker Pool:** Implements a worker pool to limit
concurrent projection triggers, preventing connection exhaustion and
improving stability under load. A new `MaxParallelTriggers`
configuration option is introduced.
### Problem Solved
Under high throughput, a race condition could cause projections to miss
events from the eventstore. This led to inconsistent data in projection
tables (e.g., a user grant might be missing). This PR fixes the
underlying locking and concurrency issues to ensure all events are
processed reliably.
### How it Works
1. **Event Writing:** When writing events, a *shared* advisory lock is
taken. This signals that a write is in progress.
2. **Event Handling (Projections):**
* A projection worker attempts to acquire an *exclusive* advisory lock
for that specific projection. If the lock is already held, it means
another worker is on the job, so the current one backs off.
* Once the lock is acquired, the worker briefly acquires and releases
the same *shared* lock used by event writers. This acts as a barrier,
ensuring it waits for any in-flight writes to complete.
* Finally, it processes all events that occurred before its transaction
began.
### Additional Information
* ZITADEL no longer modifies the `application_name` PostgreSQL variable
during event writes.
* The lock on the `current_states` table is now `FOR NO KEY UPDATE`.
* Fixes https://github.com/zitadel/zitadel/issues/8509
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
(cherry picked from commit 0575f67e94)
# Which Problems Are Solved
Starting with Zitadel v4, the new login UI is enabled by default (for
new instances) through the corresponding feature flag.
There's an additional flag to use the V2 API in console, which is mostly
required to use the login V2 without problems, but was not yet activated
by default (for new instances).
# How the Problems Are Solved
- Enabled the `ConsoleUseV2UserApi` feature flag on the
`defaultInstance`
# Additional Changes
- Cleaned up removed flags on the `defaultInstance`
# Additional Context
- noticed internally
- backport to v4.x
(cherry picked from commit 98bf8359c5)
# Which Problems Are Solved
Emails are still send only with URLs to login v1.
# How the Problems Are Solved
Add configuration for URLs as URL templates, so that links can point at
Login v2.
# Additional Changes
None
# Additional Context
Closes#10236
---------
Co-authored-by: Marco A. <marco@zitadel.com>
(cherry picked from commit 0a14c01412)
# Which Problems Are Solved
Typo in environment variable reference for
OIDC:DeviceAuth:UserCode:CharAmount config
`ZITADEL_OIDC_DEVICEAUTH_USERCODE_CHARARMOUNT` - _CHARA_**~~R~~**_MOUNT_
# How the Problems Are Solved
Fixed the typo `ZITADEL_OIDC_DEVICEAUTH_USERCODE_CHARAMOUNT`
Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
(cherry picked from commit 30175041c1)
# Which Problems Are Solved
Some events that are now unused are clogging the event queue from time
to time.
# How the Problems Are Solved
Remove the events described in #10458
# Additional Changes
- Updated `stringer` and `enumer` in Makefile target `core_generate_all`
to resolve generated files compilation issues
# Notes
It looks like there are a lot of changes, but most of it is fixing
translation files. I suggest doing a review per-commit
# Additional Context
- Closes#10458
- Depends on https://github.com/zitadel/zitadel/pull/10513
(cherry picked from commit e8a9cd6964)
# Which Problems Are Solved
The session API allowed any authenticated user to update sessions by their ID without any further check.
This was unintentionally introduced with version 2.53.0 when the requirement of providing the latest session token on every session update was removed and no other permission check (e.g. session.write) was ensured.
# How the Problems Are Solved
- Granted `session.write` to `IAM_OWNER` and `IAM_LOGIN_CLIENT` in the defaults.yaml
- Granted `session.read` to `IAM_ORG_MANAGER`, `IAM_USER_MANAGER` and `ORG_OWNER` in the defaults.yaml
- Pass the session token to the UpdateSession command.
- Check for `session.write` permission on session creation and update.
- Alternatively, the (latest) sessionToken can be used to update the session.
- Setting an auth request to failed on the OIDC Service `CreateCallback` endpoint now ensures it's either the same user as used to create the auth request (for backwards compatibilty) or requires `session.link` permission.
- Setting an device auth request to failed on the OIDC Service `AuthorizeOrDenyDeviceAuthorization` endpoint now requires `session.link` permission.
- Setting an auth request to failed on the SAML Service `CreateResponse` endpoint now requires `session.link` permission.
# Additional Changes
none
# Additional Context
none
(cherry picked from commit 4c942f3477)
# Which Problems Are Solved
The production endpoint of the service ping was wrong.
Additionally we discussed in the sprint review, that we could randomize
the default interval to prevent all systems to report data at the very
same time and also require a minimal interval.
# How the Problems Are Solved
- fixed the endpoint
- If the interval is set to @daily (default), we generate a random time
(minute, hour) as a cron format.
- Check if the interval is more than 30min and return an error if not.
- Fixed yaml indent on `ResourceCount`
# Additional Changes
None
# Additional Context
as discussed internally
This PR is still WIP and needs changes to at least the tests.
# Which Problems Are Solved
To be able to report analytical / telemetry data from deployed Zitadel
systems back to a central endpoint, we designed a "service ping"
functionality. See also https://github.com/zitadel/zitadel/issues/9706.
This PR adds the first implementation to allow collection base data as
well as report amount of resources such as organizations, users per
organization and more.
# How the Problems Are Solved
- Added a worker to handle the different `ReportType` variations.
- Schedule a periodic job to start a `ServicePingReport`
- Configuration added to allow customization of what data will be
reported
- Setup step to generate and store a `systemID`
# Additional Changes
None
# Additional Context
relates to #9869
# Which Problems Are Solved
We provide a seamless way to initialize Zitadel and the login together.
# How the Problems Are Solved
Additionally to the `IAM_OWNER` role, a set up admin user also gets the
`IAM_LOGIN_CLIENT` role if it is a machine user with a PAT.
# Additional Changes
- Simplifies the load balancing example, as the intermediate
configuration step is not needed anymore.
# Additional Context
- Depends on #10116
- Contributes to https://github.com/zitadel/zitadel-charts/issues/332
- Contributes to https://github.com/zitadel/zitadel/issues/10016
---------
Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
# Which Problems Are Solved
Currently if a user signs in using an IdP, once they sign out of
Zitadel, the corresponding IdP session is not terminated. This can be
the desired behavior. In some cases, e.g. when using a shared computer
it results in a potential security risk, since a follower user might be
able to sign in as the previous using the still open IdP session.
# How the Problems Are Solved
- Admins can enabled a federated logout option on SAML IdPs through the
Admin and Management APIs.
- During the termination of a login V1 session using OIDC end_session
endpoint, Zitadel will check if an IdP was used to authenticate that
session.
- In case there was a SAML IdP used with Federated Logout enabled, it
will intercept the logout process, store the information into the shared
cache and redirect to the federated logout endpoint in the V1 login.
- The V1 login federated logout endpoint checks every request on an
existing cache entry. On success it will create a SAML logout request
for the used IdP and either redirect or POST to the configured SLO
endpoint. The cache entry is updated with a `redirected` state.
- A SLO endpoint is added to the `/idp` handlers, which will handle the
SAML logout responses. At the moment it will check again for an existing
federated logout entry (with state `redirected`) in the cache. On
success, the user is redirected to the initially provided
`post_logout_redirect_uri` from the end_session request.
# Additional Changes
None
# Additional Context
- This PR merges the https://github.com/zitadel/zitadel/pull/9841 and
https://github.com/zitadel/zitadel/pull/9854 to main, additionally
updating the docs on Entra ID SAML.
- closes#9228
- backport to 3.x
---------
Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
Co-authored-by: Zach Hirschtritt <zachary.hirschtritt@klaviyo.com>
# Which Problems Are Solved
- Allow users to use SHA-256 and SHA-512 hashing algorithms. These
algorithms are used by Linux's crypt(3) function.
- Allow users to import passwords using the PHPass algorithm. This
algorithm is used by older PHP systems, WordPress in particular.
# How the Problems Are Solved
- Upgrade passwap to
[v0.9.0](https://github.com/zitadel/passwap/releases/tag/v0.9.0)
- Add sha2 and phpass as a new verifier option in defaults.yaml
# Additional Changes
- Updated docs to explain the two algorithms
# Additional Context
Implements the changes in the passwap library from
https://github.com/zitadel/passwap/pull/59 and
https://github.com/zitadel/passwap/pull/60
# Which Problems Are Solved
The execution handler projection handles all events to check if an
execution has to be provided to the worker to execute.
In this logic all events would be processed from the beginning which is
not necessary.
# How the Problems Are Solved
Add the current state to the execution handler projection, to avoid
processing all existing events.
# Additional Changes
Add custom configuration to the default, so that the transactions are
limited to some events.
# Additional Context
None
# Which Problems Are Solved
Instance that had improved performance flags set, got event errors when
getting instance features. This is because the improved performance
flags were marshalled using the enumerated integers, but now needed to
be unmashalled using the added UnmarshallText method.
# How the Problems Are Solved
- Remove emnumer generation
# Additional Changes
- none
# Additional Context
- reported on QA
- Backport to next-rc / v3
# Which Problems Are Solved
Webkeys were not generated with new instances when the webkey feature
flag was enabled for instance defaults. This would cause a redirect loop
with console for new instances on QA / coud.
# How the Problems Are Solved
- uncomment the webkeys section on defaults.yaml
- Fix field naming of webkey config
# Additional Changes
- Add all available features as comments.
- Make the improved performance type enum parsable from the config,
untill now they were just ints.
- Running of the enumer command created missing enum entries for feature
keys.
# Additional Context
- Needs to be back-ported to v3 / next-rc
Co-authored-by: Livio Spring <livio.a@gmail.com>
# Which Problems Are Solved
If I start a fresh instance and do not overwrite `SystemAPIUsers` I get
an error during startup `error="decoding failed due to the following
error(s):\n\n'SystemAPIUsers[0][path]' expected a map, got
'string'\n'SystemAPIUsers[0][memberships]' expected a map, got 'slice'"`
# How the Problems Are Solved
the configuration is commented so that the example is still there
# Additional Changes
-
# Additional Context
was added in https://github.com/zitadel/zitadel/pull/9757
# Which Problems Are Solved
Add the possibility to filter project resources based on project member
roles.
# How the Problems Are Solved
Extend and refactor existing Pl/PgSQL functions to implement the
following:
- Solve O(n) complexity in returned resources IDs by returning a boolean
filter for instance level permissions.
- Individually permitted orgs are returned only if there was no instance
permission
- Individually permitted projects are returned only if there was no
instance permission
- Because of the multiple filter terms, use `INNER JOIN`s instead of
`WHERE` clauses.
# Additional Changes
- system permission function no longer query the organization view and
therefore can be `immutable`, giving big performance benefits for
frequently reused system users. (like our hosted login in Zitadel cloud)
- The permitted org and project functions are now defined as `stable`
because the don't modify on-disk data. This might give a small
performance gain
- The Pl/PgSQL functions are now tested using Go unit tests.
# Additional Context
- Depends on https://github.com/zitadel/zitadel/pull/9677
- Part of https://github.com/zitadel/zitadel/issues/9188
- Closes https://github.com/zitadel/zitadel/issues/9190
# Which Problems Are Solved
Allow verification of imported salted passwords hashed with plain md5.
# How the Problems Are Solved
- Upgrade passwap to
[v0.7.0](https://github.com/zitadel/passwap/releases/tag/v0.7.0)
- Add md5salted as a new verifier option in `defaults.yaml`
# Additional Changes
- go version and libraries updated (required by passkey v0.7.0)
- secrets.md verifiers updated
- configuration verifiers updated
- added MD5salted and missing MD5Plain to test cases
# Which Problems Are Solved
The service name is hardcoded in the metrics code. Making the service
name to be configurable helps when running multiple instances of
Zitadel.
The defaults remain unchanged, the service name will be defaulted to
ZITADEL.
# How the Problems Are Solved
Add a config option to override the name in defaults.yaml and pass it
down to the corresponding metrics or tracing module (google or otel)
# Additional Changes
NA
# Additional Context
NA
# Which Problems Are Solved
Currently I am not able to run the new login with a service account with
an IAM_OWNER role.
As the role is missing some permissions which the LOGIN_CLIENT role does
have
# How the Problems Are Solved
Added session permissions to the IAM_OWNER
---------
Co-authored-by: Livio Spring <livio.a@gmail.com>
# Which Problems Are Solved
The recently introduced notification queue have potential race conditions.
# How the Problems Are Solved
Current code is refactored to use the queue package, which is safe in
regards of concurrency.
# Additional Changes
- the queue is included in startup
- improved code quality of queue
# Additional Context
- closes https://github.com/zitadel/zitadel/issues/9278
# Which Problems Are Solved
* Adds support for the service provider configuration SCIM v2 endpoints
# How the Problems Are Solved
* Adds support for the service provider configuration SCIM v2 endpoints
* `GET /scim/v2/{orgId}/ServiceProviderConfig`
* `GET /scim/v2/{orgId}/ResourceTypes`
* `GET /scim/v2/{orgId}/ResourceTypes/{name}`
* `GET /scim/v2/{orgId}/Schemas`
* `GET /scim/v2/{orgId}/Schemas/{id}`
# Additional Context
Part of #8140
Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
# Which Problems Are Solved
* Adds support for the bulk SCIM v2 endpoint
# How the Problems Are Solved
* Adds support for the bulk SCIM v2 endpoint under `POST
/scim/v2/{orgID}/Bulk`
# Additional Context
Part of #8140
Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
# Which Problems Are Solved
Zitadel currently uses 3 database pool, 1 for queries, 1 for pushing
events and 1 for scheduled projection updates. This defeats the purpose
of a connection pool which already handles multiple connections.
During load tests we found that the current structure of connection
pools consumes a lot of database resources. The resource usage dropped
after we reduced the amount of database pools to 1 because existing
connections can be used more efficiently.
# How the Problems Are Solved
Removed logic to handle multiple connection pools and use a single one.
# Additional Changes
none
# Additional Context
part of https://github.com/zitadel/zitadel/issues/8352
# Which Problems Are Solved
Currently ZITADEL defines organization and instance member roles and
permissions in defaults.yaml. The permission check is done on API call
level. For example: "is this user allowed to make this call on this
org". This makes sense on the V1 API where the API is permission-level
shaped. For example, a search for users always happens in the context of
the organization. (Either the organization the calling user belongs to,
or through member ship and the x-zitadel-orgid header.
However, for resource based APIs we must be able to resolve permissions
by object. For example, an IAM_OWNER listing users should be able to get
all users in an instance based on the query filters. Alternatively a
user may have user.read permissions on one or more orgs. They should be
able to read just those users.
# How the Problems Are Solved
## Role permission mapping
The role permission mappings defined from `defaults.yaml` or local
config override are synchronized to the database on every run of
`zitadel setup`:
- A single query per **aggregate** builds a list of `add` and `remove`
actions needed to reach the desired state or role permission mappings
from the config.
- The required events based on the actions are pushed to the event
store.
- Events define search fields so that permission checking can use the
indices and is strongly consistent for both query and command sides.
The migration is split in the following aggregates:
- System aggregate for for roles prefixed with `SYSTEM`
- Each instance for roles not prefixed with `SYSTEM`. This is in
anticipation of instance level management over the API.
## Membership
Current instance / org / project membership events now have field table
definitions. Like the role permissions this ensures strong consistency
while still being able to use the indices of the fields table. A
migration is provided to fill the membership fields.
## Permission check
I aimed keeping the mental overhead to the developer to a minimal. The
provided implementation only provides a permission check for list
queries for org level resources, for example users. In the `query`
package there is a simple helper function `wherePermittedOrgs` which
makes sure the underlying database function is called as part of the
`SELECT` query and the permitted organizations are part of the `WHERE`
clause. This makes sure results from non-permitted organizations are
omitted. Under the hood:
- A Pg/PlSQL function searches for a list of organization IDs the passed
user has the passed permission.
- When the user has the permission on instance level, it returns early
with all organizations.
- The functions uses a number of views. The views help mapping the
fields entries into relational data and simplify the code use for the
function. The views provide some pre-filters which allow proper index
usage once the final `WHERE` clauses are set by the function.
# Additional Changes
# Additional Context
Closes#9032
Closes https://github.com/zitadel/zitadel/issues/9014https://github.com/zitadel/zitadel/issues/9188 defines follow-ups for
the new permission framework based on this concept.
# Which Problems Are Solved
- Adds infrastructure code (basic implementation, error handling,
middlewares, ...) to implement the SCIM v2 interface
- Adds support for the user create SCIM v2 endpoint
# How the Problems Are Solved
- Adds support for the user create SCIM v2 endpoint under `POST
/scim/v2/{orgID}/Users`
# Additional Context
Part of #8140
# Which Problems Are Solved
It was possible to set a diffent algorithm for the legacy signer. This
is not supported howerver and breaks the token endpoint.
# How the Problems Are Solved
Remove the OIDC.SigningKeyAlgorithm config option and hard-code RS256
for the legacy signer.
# Additional Changes
- none
# Additional Context
Only RS256 is supported by the legacy signer. It was mentioned in the
comment of the config not to use it and use the webkeys resource
instead.
- closes#9121
# Which Problems Are Solved
The console has no information about where and how to send PostHog
events.
# How the Problems Are Solved
A PostHog API URL and token are passed through as plain text from the
Zitadel runtime config to the environment.json. By default, no values
are configured and the keys in the environment.json are omitted.
# Additional Context
- Closes https://github.com/zitadel/zitadel/issues/9070
- Complements https://github.com/zitadel/zitadel/pull/9077
# Which Problems Are Solved
Some IdP callbacks use HTTP form POST to return their data on callbacks.
For handling CSRF in the login after such calls, a 302 Found to the
corresponding non form callback (in ZITADEL) is sent. Depending on the
size of the initial form body, this could lead to ZITADEL terminating
the connection, resulting in the user not getting a response or an
intermediate proxy to return them an HTTP 502.
# How the Problems Are Solved
- the form body is parsed and stored into the ZITADEL cache (using the
configured database by default)
- the redirect (302 Found) is performed with the request id
- the callback retrieves the data from the cache instead of the query
parameters (will fallback to latter to handle open uncached requests)
# Additional Changes
- fixed a typo in the default (cache) configuration: `LastUsage` ->
`LastUseAge`
# Additional Context
- reported by a customer
- needs to be backported to current cloud version (2.66.x)
---------
Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
# Which Problems Are Solved
It is currently not possible to use SAML with the Session API.
# How the Problems Are Solved
Add SAML service, to get and resolve SAML requests.
Add SAML session and SAML request aggregate, which can be linked to the
Session to get back a SAMLResponse from the API directly.
# Additional Changes
Update of dependency zitadel/saml to provide all functionality for
handling of SAML requests and responses.
# Additional Context
Closes#6053
---------
Co-authored-by: Livio Spring <livio.a@gmail.com>
# Which Problems Are Solved
To be able to migrate or test the new login UI, admins might want to
(temporarily) switch individual apps.
At a later point admin might want to make sure all applications use the
new login UI.
# How the Problems Are Solved
- Added a feature flag `` on instance level to require all apps to use
the new login and provide an optional base url.
- if the flag is enabled, all (OIDC) applications will automatically use
the v2 login.
- if disabled, applications can decide based on their configuration
- Added an option on OIDC apps to use the new login UI and an optional
base url.
- Removed the requirement to use `x-zitadel-login-client` to be
redirected to the login V2 and retrieve created authrequest and link
them to SSO sessions.
- Added a new "IAM_LOGIN_CLIENT" role to allow management of users,
sessions, grants and more without `x-zitadel-login-client`.
# Additional Changes
None
# Additional Context
closes https://github.com/zitadel/zitadel/issues/8702
# Which Problems Are Solved
Scheduled handlers use `eventstore.InstanceIDs` to get the all active
instances within a given timeframe. This function scrapes through all
events written within that time frame which can cause heavy load on the
database.
# How the Problems Are Solved
A new query cache `activeInstances` is introduced which caches the ids
of all instances queried by id or host within the configured timeframe.
# Additional Changes
- Changed `default.yaml`
- Removed `HandleActiveInstances` from custom handler configs
- Added `MaxActiveInstances` to define the maximal amount of cached
instance ids
- fixed start-from-init and start-from-setup to start auth and admin
projections twice
- fixed org cache invalidation to use correct index
# Additional Context
- part of #8999
# Which Problems Are Solved
There are some problems related to the use of CockroachDB with the new
notification handling (#8931).
See #9002 for details.
# How the Problems Are Solved
- Brought back the previous notification handler as legacy mode.
- Added a configuration to choose between legacy mode and new parallel
workers.
- Enabled legacy mode by default to prevent issues.
# Additional Changes
None
# Additional Context
- closes https://github.com/zitadel/zitadel/issues/9002
- relates to #8931
# Which Problems Are Solved
While running the latest RC / main, we noticed some errors including
context timeouts and rollback issues.
# How the Problems Are Solved
- The transaction context is passed and used for any event being written
and for handling savepoints to be able to handle context timeouts.
- The user projection is not triggered anymore. This will reduce
unnecessary load and potential timeouts if lot of workers are running.
In case a user would not be projected yet, the request event will log an
error and then be skipped / retried on the next run.
- Additionally, the context is checked if being closed after each event
process.
- `latestRetries` now correctly only returns the latest retry events to
be processed
- Default values for notifications have been changed to run workers less
often, more retry delay, but less transaction duration.
# Additional Changes
None
# Additional Context
relates to #8931
---------
Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
# Which Problems Are Solved
The action v2 messages were didn't contain anything providing security
for the sent content.
# How the Problems Are Solved
Each Target now has a SigningKey, which can also be newly generated
through the API and returned at creation and through the Get-Endpoints.
There is now a HTTP header "Zitadel-Signature", which is generated with
the SigningKey and Payload, and also contains a timestamp to check with
a tolerance if the message took to long to sent.
# Additional Changes
The functionality to create and check the signature is provided in the
pkg/actions package, and can be reused in the SDK.
# Additional Context
Closes#7924
---------
Co-authored-by: Livio Spring <livio.a@gmail.com>
# Which Problems Are Solved
The current handling of notification follows the same pattern as all
other projections:
Created events are handled sequentially (based on "position") by a
handler. During the process, a lot of information is aggregated (user,
texts, templates, ...).
This leads to back pressure on the projection since the handling of
events might take longer than the time before a new event (to be
handled) is created.
# How the Problems Are Solved
- The current user notification handler creates separate notification
events based on the user / session events.
- These events contain all the present and required information
including the userID.
- These notification events get processed by notification workers, which
gather the necessary information (recipient address, texts, templates)
to send out these notifications.
- If a notification fails, a retry event is created based on the current
notification request including the current state of the user (this
prevents race conditions, where a user is changed in the meantime and
the notification already gets the new state).
- The retry event will be handled after a backoff delay. This delay
increases with every attempt.
- If the configured amount of attempts is reached or the message expired
(based on config), a cancel event is created, letting the workers know,
the notification must no longer be handled.
- In case of successful send, a sent event is created for the
notification aggregate and the existing "sent" events for the user /
session object is stored.
- The following is added to the defaults.yaml to allow configuration of
the notification workers:
```yaml
Notifications:
# The amount of workers processing the notification request events.
# If set to 0, no notification request events will be handled. This can be useful when running in
# multi binary / pod setup and allowing only certain executables to process the events.
Workers: 1 # ZITADEL_NOTIFIACATIONS_WORKERS
# The amount of events a single worker will process in a run.
BulkLimit: 10 # ZITADEL_NOTIFIACATIONS_BULKLIMIT
# Time interval between scheduled notifications for request events
RequeueEvery: 2s # ZITADEL_NOTIFIACATIONS_REQUEUEEVERY
# The amount of workers processing the notification retry events.
# If set to 0, no notification retry events will be handled. This can be useful when running in
# multi binary / pod setup and allowing only certain executables to process the events.
RetryWorkers: 1 # ZITADEL_NOTIFIACATIONS_RETRYWORKERS
# Time interval between scheduled notifications for retry events
RetryRequeueEvery: 2s # ZITADEL_NOTIFIACATIONS_RETRYREQUEUEEVERY
# Only instances are projected, for which at least a projection-relevant event exists within the timeframe
# from HandleActiveInstances duration in the past until the projection's current time
# If set to 0 (default), every instance is always considered active
HandleActiveInstances: 0s # ZITADEL_NOTIFIACATIONS_HANDLEACTIVEINSTANCES
# The maximum duration a transaction remains open
# before it spots left folding additional events
# and updates the table.
TransactionDuration: 1m # ZITADEL_NOTIFIACATIONS_TRANSACTIONDURATION
# Automatically cancel the notification after the amount of failed attempts
MaxAttempts: 3 # ZITADEL_NOTIFIACATIONS_MAXATTEMPTS
# Automatically cancel the notification if it cannot be handled within a specific time
MaxTtl: 5m # ZITADEL_NOTIFIACATIONS_MAXTTL
# Failed attempts are retried after a confogired delay (with exponential backoff).
# Set a minimum and maximum delay and a factor for the backoff
MinRetryDelay: 1s # ZITADEL_NOTIFIACATIONS_MINRETRYDELAY
MaxRetryDelay: 20s # ZITADEL_NOTIFIACATIONS_MAXRETRYDELAY
# Any factor below 1 will be set to 1
RetryDelayFactor: 1.5 # ZITADEL_NOTIFIACATIONS_RETRYDELAYFACTOR
```
# Additional Changes
None
# Additional Context
- closes#8931
# Which Problems Are Solved
Organizations are ofter searched for by ID or primary domain. This
results in many redundant queries, resulting in a performance impact.
# How the Problems Are Solved
Cache Organizaion objects by ID and primary domain.
# Additional Changes
- Adjust integration test config to use all types of cache.
- Adjust integration test lifetimes so the pruner has something to do
while the tests run.
# Additional Context
- Closes#8865
- After #8902
# Which Problems Are Solved
By having default entries in the `Username` and `ClientName` fields, it
was not possible to unset there parameters. Unsetting them is required
for GCP connections
# How the Problems Are Solved
Set the fields to empty strings.
# Additional Changes
- none
# Additional Context
- none
# Which Problems Are Solved
If a redis cache has connection issues or any other type of permament
error,
it tanks the responsiveness of ZITADEL.
We currently do not support things like Redis cluster or sentinel. So
adding a simple redis cache improves performance but introduces a single
point of failure.
# How the Problems Are Solved
Implement a [circuit
breaker](https://learn.microsoft.com/en-us/previous-versions/msp-n-p/dn589784(v=pandp.10)?redirectedfrom=MSDN)
as
[`redis.Limiter`](https://pkg.go.dev/github.com/redis/go-redis/v9#Limiter)
by wrapping sony's [gobreaker](https://github.com/sony/gobreaker)
package. This package is picked as it seems well maintained and we
already use their `sonyflake` package
# Additional Changes
- The unit tests constructed an unused `redis.Client` and didn't cleanup
the connector. This is now fixed.
# Additional Context
Closes#8864
# Which Problems Are Solved
Add a cache implementation using Redis single mode. This does not add
support for Redis Cluster or sentinel.
# How the Problems Are Solved
Added the `internal/cache/redis` package. All operations occur
atomically, including setting of secondary indexes, using LUA scripts
where needed.
The [`miniredis`](https://github.com/alicebob/miniredis) package is used
to run unit tests.
# Additional Changes
- Move connector code to `internal/cache/connector/...` and remove
duplicate code from `query` and `command` packages.
- Fix a missed invalidation on the restrictions projection
# Additional Context
Closes#8130
# Which Problems Are Solved
Currently ZITADEL supports RP-initiated logout for clients. Back-channel
logout ensures that user sessions are terminated across all connected
applications, even if the user closes their browser or loses
connectivity providing a more secure alternative for certain use cases.
# How the Problems Are Solved
If the feature is activated and the client used for the authentication
has a back_channel_logout_uri configured, a
`session_logout.back_channel` will be registered. Once a user terminates
their session, a (notification) handler will send a SET (form POST) to
the registered uri containing a logout_token (with the user's ID and
session ID).
- A new feature "back_channel_logout" is added on system and instance
level
- A `back_channel_logout_uri` can be managed on OIDC applications
- Added a `session_logout` aggregate to register and inform about sent
`back_channel` notifications
- Added a `SecurityEventToken` channel and `Form`message type in the
notification handlers
- Added `TriggeredAtOrigin` fields to `HumanSignedOut` and
`TerminateSession` events for notification handling
- Exported various functions and types in the `oidc` package to be able
to reuse for token signing in the back_channel notifier.
- To prevent that current existing session termination events will be
handled, a setup step is added to set the `current_states` for the
`projections.notifications_back_channel_logout` to the current position
- [x] requires https://github.com/zitadel/oidc/pull/671
# Additional Changes
- Updated all OTEL dependencies to v1.29.0, since OIDC already updated
some of them to that version.
- Single Session Termination feature is correctly checked (fixed feature
mapping)
# Additional Context
- closes https://github.com/zitadel/zitadel/issues/8467
- TODO:
- Documentation
- UI to be done: https://github.com/zitadel/zitadel/issues/8469
---------
Co-authored-by: Hidde Wieringa <hidde@hiddewieringa.nl>
# Which Problems Are Solved
System administrators can block hosts and IPs for HTTP calls in actions.
Using DNS, blocked IPs could be bypassed.
# How the Problems Are Solved
- Hosts are resolved (DNS lookup) to check whether their corresponding
IP is blocked.
# Additional Changes
- Added complete lookup ip address range and "unspecified" address to
the default `DenyList`
# Which Problems Are Solved
The primary issue addressed in this PR is that the defaults.yaml file
contains escaped characters (like `<` for < and `>` for >) in
message texts, which prevents valid HTML rendering in certain parts of
the Zitadel platform.
These escaped characters are used in user-facing content (e.g., email
templates or notifications), resulting in improperly displayed text,
where the HTML elements like line breaks or bold text don't render
correctly.
# How the Problems Are Solved
The solution involves replacing the escaped characters with their
corresponding HTML tags in the defaults.yaml file, ensuring that the
HTML renders correctly in the emails or user interfaces where these
messages are displayed.
This update ensures that:
- The HTML in these message templates is rendered properly, improving
the user experience.
- The content looks professional and adheres to web standards for
displaying HTML content.
# Additional Changes
N/A
# Additional Context
N/A
- Closes#8531
Co-authored-by: Max Peintner <max@caos.ch>
# Which Problems Are Solved
We identified the need of caching.
Currently we have a number of places where we use different ways of
caching, like go maps or LRU.
We might also want shared chaches in the future, like Redis-based or in
special SQL tables.
# How the Problems Are Solved
Define a generic Cache interface which allows different implementations.
- A noop implementation is provided and enabled as.
- An implementation using go maps is provided
- disabled in defaults.yaml
- enabled in integration tests
- Authz middleware instance objects are cached using the interface.
# Additional Changes
- Enabled integration test command raceflag
- Fix a race condition in the limits integration test client
- Fix a number of flaky integration tests. (Because zitadel is super
fast now!) 🎸🚀
# Additional Context
Related to https://github.com/zitadel/zitadel/issues/8648