# Which Problems Are Solved
1. The projection handler reported no error if an error happened but
updating the current state was successful. This can lead to skipped
projections during setup as soon as the projection has an error but does
not correctly report if to the caller.
2. Mirror projections skipped as soon as an error occures, this leads to
unprojected projections.
3. Mirror checked position wrongly in some cases
# How the Problems Are Solved
1. the error returned by the `Trigger` method will will only be set to
the error of updating current states if there occured an error.
2. triggering projections checks for the error type returned and retries
if the error had code `23505`
3. Corrected to use the `Equal` method
# Additional Changes
unify logging on mirror projections
1. After second execution, mirror starts to fail because of Primary key
constraints on the events table. Because mirror always took the the
first `system.mirror.succeeded` instead of the newest one
2. Mirror panicked during migration of fields tables
1. Adjusted the database query to order descending and limit 1
2. added missing assignment
- detailed logging if the copy from statement failed.
# Which Problems Are Solved
- fields projections were not projected during mirror
# How the Problems Are Solved
- an extra step during projections was added to mirror the fields
# Additional Changes
none
# Additional Context
none
Float64 which was used for the event.Position field is [not precise in
go and gets rounded](https://github.com/golang/go/issues/47300). This
can lead to unprecies position tracking of events and therefore
projections especially on cockcoachdb as the position used there is a
big number.
example of a unprecies position:
exact: 1725257931223002628
float64: 1725257931223002624.000000
The float64 was replaced by
[github.com/jackc/pgx-shopspring-decimal](https://github.com/jackc/pgx-shopspring-decimal).
Rename `latestSequence`-queries to `latestPosition`
closes https://github.com/zitadel/zitadel/issues/8863
The `auth.auth_requests` table is not cleaned up so long running Zitadel
installations can contain many rows.
The mirror command can take long because a the data are first copied
into memory (or disk) on cockroach and users do not get any output from
mirror. This is unfortunate because people don't know if Zitadel got
stuck.
Enhance logging throughout the projection processes and introduce a
configuration option for the maximum age of authentication requests.
None
closes https://github.com/zitadel/zitadel/issues/9764
---------
Co-authored-by: Livio Spring <livio.a@gmail.com>
# Which Problems Are Solved
ZITADEL allows the use of JSON Web Token (JWT) Profile OAuth 2.0 for Authorization Grants in machine-to-machine (M2M) authentication. Multiple keys can be managed for a single machine account (service user), each with an individual expiry.
A vulnerability existed where expired keys can be used to retrieve tokens. Specifically, ZITADEL fails to properly check the expiration date of the JWT key when used for Authorization Grants. This allows an attacker with an expired key to obtain valid access tokens.
This vulnerability does not affect the use of JWT Profile for OAuth 2.0 Client Authentication on the Token and Introspection endpoints, which correctly reject expired keys.
# How the Problems Are Solved
Added proper validation of the expiry of the stored public key.
# Additional Changes
None
# Additional Context
None
(cherry picked from commit 315503beab)
# Which Problems Are Solved
The username entered by the user was resp. replaced by the stored user's username. This provided a possibility to enumerate usernames as unknown usernames were not normalized.
# How the Problems Are Solved
- Store and display the username as entered by the user.
- Removed the part where the loginname was always set to the user's loginname when retrieving the `nextSteps`
# Additional Changes
None
# Additional Context
None
(cherry picked from commit 14de8ecac2)
# Which Problems Are Solved
With current provided telemetry it's difficult to predict when a
projection handler is under increased load until it's too late and
causes downstream issues. Importantly, projection updating is in the
critical path for many login flows and increased latency there can
result in system downtime for users.
# How the Problems Are Solved
This PR adds three new prometheus-style metrics:
1. **projection_events_processed** (_labels: projection, success_) -
This metric gives us a counter of the number of events processed per
projection update run and whether they we're processed without error. A
high number of events being processed can let us know how busy a
particular projection handler is.
2. **projection_handle_timer** _(labels: projection)_ - This is the time
it takes to process a projection update given a batch of events - time
to take the current_states lock, query for new events, reduce,
update_the projection, and update current_states.
3. **projection_state_latency** _(labels: projection)_ - This is the
time from the last event processed in the current_states table for a
given projection. It tells us how old was the last event you processed?
Or, how far behind are you running for this projection? Higher latencies
could mean high load or stalled projection handling.
# Additional Changes
I also had to initialize the global otel metrics provider (`metrics.M`)
in the `setup` step additionally to `start` since projection handlers
are initialized at setup. The initialization checks if a metrics
provider is already set (in case of `start-from-setup` or
`start-from-init` to prevent overwriting, which causes the otel metrics
provider to stop working.
# Additional Context
## Example Dashboards


---------
Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
Co-authored-by: Livio Spring <livio.a@gmail.com>
(cherry picked from commit c1535b7b49)
# Which Problems Are Solved
The service name is hardcoded in the metrics code. Making the service
name to be configurable helps when running multiple instances of
Zitadel.
The defaults remain unchanged, the service name will be defaulted to
ZITADEL.
# How the Problems Are Solved
Add a config option to override the name in defaults.yaml and pass it
down to the corresponding metrics or tracing module (google or otel)
# Additional Changes
NA
# Additional Context
NA
(cherry picked from commit dc64e35128)
# Which Problems Are Solved
Zitadel should not record 404 response counts of unknown paths (check
`/debug/metrics`).
This can lead to high cardinality on metrics endpoint and in traces.
```
GOOD http_server_return_code_counter_total{method="GET",otel_scope_name="",otel_scope_version="",return_code="200",uri="/.well-known/openid-configuration"} 2
GOOD http_server_return_code_counter_total{method="GET",otel_scope_name="",otel_scope_version="",return_code="200",uri="/oauth/v2/keys"} 2
BAD http_server_return_code_counter_total{method="GET",otel_scope_name="",otel_scope_version="",return_code="404",uri="/junk"} 2000
```
After
```
GOOD http_server_return_code_counter_total{method="GET",otel_scope_name="",otel_scope_version="",return_code="200",uri="/.well-known/openid-configuration"} 2
GOOD http_server_return_code_counter_total{method="GET",otel_scope_name="",otel_scope_version="",return_code="200",uri="/oauth/v2/keys"} 2
```
# How the Problems Are Solved
This PR makes sure, that any unknown path is recorded as `UNKNOWN_PATH`
instead of the actual path.
# Additional Changes
N/A
# Additional Context
On our production instance, when a penetration test was run, it caused
our metric count to blow up to many thousands due to Zitadel recording
404 response counts.
Next nice to have steps, remove 404 timer recordings which serve no
purpose
---------
Co-authored-by: Livio Spring <livio.a@gmail.com>
Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
Co-authored-by: Livio Spring <livio@zitadel.com>
(cherry picked from commit 599850e7e8)
# Which Problems Are Solved
When using a custom / new login UI and an OIDC application with
registered BackChannelLogoutUI, no logout requests were sent to the URI
when the user signed out.
Additionally, as described in #9427, an error was logged:
`level=error msg="event of type *session.TerminateEvent doesn't
implement OriginEvent"
caller="/home/runner/work/zitadel/zitadel/internal/notification/handlers/origin.go:24"`
# How the Problems Are Solved
- Properly pass `TriggerOrigin` information to session.TerminateEvent
creation and implement `OriginEvent` interface.
- Implemented `RegisterLogout` in `CreateOIDCSessionFromAuthRequest` and
`CreateOIDCSessionFromDeviceAuth`, both used when interacting with the
OIDC v2 API.
- Both functions now receive the `BackChannelLogoutURI` of the client
from the OIDC layer.
# Additional Changes
None
# Additional Context
- closes#9427
(cherry picked from commit ed697bbd69)
# Which Problems Are Solved
When requesting a JWT (`urn:ietf:params:oauth:token-type:jwt`) to be
returned in a Token Exchange request, ZITADEL would panic if the `actor`
was not granted the necessary permission.
# How the Problems Are Solved
Properly check the error and return it.
# Additional Changes
None
# Additional Context
- closes#9436
(cherry picked from commit e6ce1af003)
# Which Problems Are Solved
#9110 introduced more possibilities to search for "own" sessions. Due to
this the permission checks for retrieving a session had to be updated
accordingly. Internal calls, such as retrieving them for sending
notifications do not require a permission, but the code was not properly
adjusted and thus could lead to panics.
# How the Problems Are Solved
- Properly handled (do not require) permission check for internal only
calls when retrieving the session by id.
# Additional Changes
None
# Additional Context
- needs backports to 2.68.x, 2.69.x, 2.70.x
- closeszitadel/devops#117
(cherry picked from commit 4e1868e9bb)
# Which Problems Are Solved
As reported in #9311, even when providing a `x-zitadel-login-client`
header, the auth request would be created as hosted login UI / V1
request.
This is due to a change introduced with #9071, where the login UI
version can be specified using the app configuration.
The configuration set to V1 was not considering if the header was sent.
# How the Problems Are Solved
- Check presence of `x-zitadel-login-client` before the configuration.
Use later only if no header is set.
# Additional Changes
None
# Additional Context
- closes#9311
- needs back ports to 2.67.x, 2.68.x and 2.69.x
(cherry picked from commit e7a73eb6b1)
# Which Problems Are Solved
- SCIM tests are flaky due to metadata being set by the tests while
shortly after being read by the application, resulting in a race
condition
# How the Problems Are Solved
- whenever metadata is set, the projection is awaited
# Additional Context
Part of #8140
Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
(cherry picked from commit 4dc7a58a25)
# Which Problems Are Solved
- Microsoft Entra invokes the user patch endpoint with `"active":
"True"` / `"active": "False"` when patching a user. This is a well-known
bug in MS Entra (see
[here](https://learn.microsoft.com/en-us/entra/identity/app-provisioning/application-provisioning-config-problem-scim-compatibility)),
but the bug fix has not landed yet and/or the feature flag does not
work.
# How the Problems Are Solved
- To ensure compatibility with MS Entra, the parsing of the the boolean
active flag of the scim user is relaxed and accepts strings in any
casing that resolve to `true` or `false` as well as raw boolean values.
# Additional Context
Part of https://github.com/zitadel/zitadel/issues/8140
(cherry picked from commit 361f7a2edc)
# Which Problems Are Solved
There were multiple issues in the OpenTelemetry (OTEL) implementation
and usage for tracing and metrics, which lead to high cardinality and
potential memory leaks:
- wrongly initiated tracing interceptors
- high cardinality in traces:
- HTTP/1.1 endpoints containing host names
- HTTP/1.1 endpoints containing object IDs like userID (e.g.
`/management/v1/users/2352839823/`)
- high amount of traces from internal processes (spooler)
- high cardinality in metrics endpoint:
- GRPC entries containing host names
- notification metrics containing instanceIDs and error messages
# How the Problems Are Solved
- Properly initialize the interceptors once and update them to use the
grpc stats handler (unary interceptors were deprecated).
- Remove host names from HTTP/1.1 span names and use path as default.
- Set / overwrite the uri for spans on the grpc-gateway with the uri
pattern (`/management/v1/users/{user_id}`). This is used for spans in
traces and metric entries.
- Created a new sampler which will only sample spans in the following
cases:
- remote was already sampled
- remote was not sampled, root span is of kind `Server` and based on
fraction set in the runtime configuration
- This will prevent having a lot of spans from the spooler back ground
jobs if they were not started by a client call querying an object (e.g.
UserByID).
- Filter out host names and alike from OTEL generated metrics (using a
`view`).
- Removed instance and error messages from notification metrics.
# Additional Changes
Fixed the middleware handling for serving Console. Telemetry and
instance selection are only used for the environment.json, but not on
statically served files.
# Additional Context
- closes#8096
- relates to #9074
- back ports to at least 2.66.x, 2.67.x and 2.68.x
(cherry picked from commit 990e1982c7)
# Which Problems Are Solved
- when a scim user is provisioned, a init email could be sent
# How the Problems Are Solved
- no init email should be sent => hard code false for the email init
param
# Additional Context
Related to https://github.com/zitadel/zitadel/issues/8140
Co-authored-by: Fabienne Bühler <fabienne@zitadel.com>
# Which Problems Are Solved
- Some SCIM clients send "op" of a patch operation in PascalCase
# How the Problems Are Solved
- Well known "op" values of patch operations are matched
case-insensitive.
# Additional Context
Related to #8140
# Which Problems Are Solved
- If a SCIM endpoint is called with an orgID in the URL that is not the
resource owner, no error is returned, and the action is executed.
# How the Problems Are Solved
- The orgID provided in the SCIM URL path must match the resource owner
of the target user. Otherwise, an error will be returned.
# Additional Context
Part of https://github.com/zitadel/zitadel/issues/8140
# Which Problems Are Solved
* Adds support for the service provider configuration SCIM v2 endpoints
# How the Problems Are Solved
* Adds support for the service provider configuration SCIM v2 endpoints
* `GET /scim/v2/{orgId}/ServiceProviderConfig`
* `GET /scim/v2/{orgId}/ResourceTypes`
* `GET /scim/v2/{orgId}/ResourceTypes/{name}`
* `GET /scim/v2/{orgId}/Schemas`
* `GET /scim/v2/{orgId}/Schemas/{id}`
# Additional Context
Part of #8140
Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
# Which Problems Are Solved
Add the ability to update the timestamp when MFA initialization was last
skipped.
Get User By ID now also returns the timestamps when MFA setup was last
skipped.
# How the Problems Are Solved
- Add a `HumanMFAInitSkipped` method to the `users/v2` API.
- MFA skipped was already projected in the `auth.users3` table. In this
PR the same column is added to the users projection. Event handling is
kept the same as in the `UserView`:
<details>
62804ca45f/internal/user/repository/view/model/user.go (L243-L377)
</details>
# Additional Changes
- none
# Additional Context
- Closes https://github.com/zitadel/zitadel/issues/9197
# Which Problems Are Solved
* Adds support for the bulk SCIM v2 endpoint
# How the Problems Are Solved
* Adds support for the bulk SCIM v2 endpoint under `POST
/scim/v2/{orgID}/Bulk`
# Additional Context
Part of #8140
Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
# Which Problems Are Solved
- when listing users via scim v2.0 filters applied to the username are
applied case-sensitive
# How the Problems Are Solved
- when a query filter is appleid on the username it is applied
case-insensitive
# Additional Context
Part of https://github.com/zitadel/zitadel/issues/8140
# Which Problems Are Solved
Adds `failed attempts` field to the grpc response when a user enters
wrong password when logging in
FYI:
this only covers the senario above; other senarios where this is not
applied are:
SetPasswordWithVerifyCode
setPassword
ChangPassword
setPasswordWithPermission
# How the Problems Are Solved
Created new grpc message `CredentialsCheckError` -
`proto/zitadel/message.proto` to include `failed_attempts` field.
Had to create a new package -
`github.com/zitadel/zitadel/internal/command/errors` to resolve cycle
dependency between `github.com/zitadel/zitadel/internal/command` and
`github.com/zitadel/zitadel/internal/command`.
# Additional Changes
- none
# Additional Context
- Closes https://github.com/zitadel/zitadel/issues/9198
---------
Co-authored-by: Iraq Jaber <IraqJaber@gmail.com>
# Which Problems Are Solved
- scim v2 only maps the primary phone/email to the zitadel user, this
does not work if no primary is set
# How the Problems Are Solved
- the first phone / email is mapped if no primary is available
# Additional Context
Part of #8140
Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
# Which Problems Are Solved
#9185 changed that if a notification channel was not present,
notification workers would no longer retry to send the notification and
would also cancel in case Twilio would return a 4xx error.
However, this would not affect the "legacy" mode.
# How the Problems Are Solved
- Handle `CancelError` in legacy notifier as not failed (event).
# Additional Changes
None
# Additional Context
- relates to #9185
- requires back port to 2.66.x and 2.67.x
(cherry picked from commit 3fc68e5d60)
# Which Problems Are Solved
#9185 changed that if a notification channel was not present,
notification workers would no longer retry to send the notification and
would also cancel in case Twilio would return a 4xx error.
However, this would not affect the "legacy" mode.
# How the Problems Are Solved
- Handle `CancelError` in legacy notifier as not failed (event).
# Additional Changes
None
# Additional Context
- relates to #9185
- requires back port to 2.66.x and 2.67.x
# Which Problems Are Solved
- scim list users endpoint (`GET /scim/v2/{orgId}/Users`): handle
unsupported `SortBy` columns correctly
# How the Problems Are Solved
- throw an error if sorting by an unsupported column is requested
# Additional Context
Part of #8140
# Which Problems Are Solved
- requests to the scim interface with content type `*/*` are rejected
# How the Problems Are Solved
- `*/*` is accepted as content type
# Additional Context
Part of #8140
# Which Problems Are Solved
- SCIM user metadata mapping keys have differing case styles.
# How the Problems Are Solved
- key casing style is unified to strict camelCase
# Additional Context
Part of #8140
Although this is technically a breaking change, it is considered
acceptable because the SCIM feature is still in the preview stage and
not fully implemented yet.
Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
# Which Problems Are Solved
* Adds support for the patch user SCIM v2 endpoint
# How the Problems Are Solved
* Adds support for the patch user SCIM v2 endpoint under `PATCH
/scim/v2/{orgID}/Users/{id}`
# Additional Context
Part of #8140