Commit Graph

2035 Commits

Author SHA1 Message Date
Marco A
e6b5f559f0 fix: Force v2 permission checks on user listing
# Which Problems Are Solved

When the feature flag for enabling permission checks v2 is disabled, a user without permission could list users across instances and get the total number of users available.

# How the Problems Are Solved

Disregard the state of the feature flag and always enforce permission checks v2 on v2 APIs.

---------

Co-authored-by: Livio Spring <livio.a@gmail.com>
(cherry picked from commit 826039c620)

(cherry picked from commit 0e17d0005a)
2025-12-10 16:31:12 +01:00
Livio Spring
4da0c1c1ca fix: validate IDP linking conditions
# Which Problems Are Solved

When auto-linking was enabled on an IdP, there was no check if linking to the found user is allowed, i.e. if the corresponding IdP is active in the user's organization or if external authentication in general was allowed.

# How the Problems Are Solved

- (Re)Check the login policy of the user's organization before linking the external identity.

# Additional Changes

None

# Additional Context

None
---------

Co-authored-by: Max Peintner <peintnerm@gmail.com>

(cherry picked from commit 33c51deb20)
2025-11-12 15:05:36 +01:00
Livio Spring
54a395f551 fix(authz): ignore unready auth methods for mfa requirement check (#11056)
# Which Problems Are Solved

The recent
[fix](2a7db64881)
made sure the Zitadel API always requires MFA if a user has set up so
even though not required by the login policy. After the deployment,
multiple users reached out that also users without any MFA set up got
the corresponding `[permission_denied] mfa required (AUTHZ-KI3p0)`error.

# How the Problems Are Solved

- Only check the set up factors with are verified and ready to use.
Ignore all unready auth methods.

# Additional Changes

None

# Additional Context

- relates to
2a7db64881
- closes https://github.com/zitadel/zitadel/issues/11055
- requires backport to v2.71.x, v3.x and v4.x

(cherry picked from commit e4a959c321)
(cherry picked from commit 8d4f6082ca)
2025-11-11 10:48:32 +01:00
Livio Spring
921ad56a21 fix integration test 2025-10-29 13:19:05 +01:00
Livio Spring
56b8e52c74 chore: use postgres 17 (#10797)
# Which Problems Are Solved

The current cache interface implementation for postgres is not
compatible with Postgres18, since we rely on partitioned unlogged
tables, which are no longer supported.

# How the Problems Are Solved

Use postgres 17 and update compatibility in the docs.

# Additional Changes

None

# Additional Context

- requires backport to v3.x, v4.x

(cherry picked from commit f7fbd0cdfd)
2025-10-29 12:56:04 +01:00
Livio Spring
2a7db64881 fix: check for 2fa even if not enforced
# Which Problems Are Solved

Zitadel enforces MFA if required by the organization's policy, but did not ensure in all cases, if a user voluntarily set it up.

# How the Problems Are Solved

Ensure 2FA/MFA is required in any call to Zitadel if set up by user even if policy does not require.

# Additional Changes

None

# Additional Context

- requires backports

(cherry picked from commit b284f8474e)
(cherry picked from commit f7309f8295)
2025-10-29 10:38:10 +01:00
Livio Spring
8529ebdabc fix: respect lockout policy on password change (with old password) and add tar pit for checks
# Which Problems Are Solved

While the lockout policy was correctly applied on the session API and other authentication and management endpoints , it had no effect on the user service v2 endpoints.

# How the Problems Are Solved

- Correctly apply lockout policy on the user service v2 endpoints.
- Added tar pitting to auth factor checks (authentication and management API) to prevent brute-force attacks or denial of service because of user lockouts.
- Tar pitting is not active if `IgnoreUnknownUsername` option is active to prevent leaking information whether a user exists or not.

# Additional Changes

None

# Additional Context

- requires backports

* cleanup

(cherry picked from commit b8db8cdf9c)

(cherry picked from commit d3713dfaed)
2025-10-29 10:34:23 +01:00
Livio Spring
d9cd4e9892 fix: sanitize host headers before use
# Which Problems Are Solved

Host headers used to identify the instance and further used in public responses such as OIDC discovery endpoints, email links and more were not correctly handled. While they were matched against existing instances, they were not properly sanitized.

# How the Problems Are Solved

Sanitize host header including port validation (if provided).

# Additional Changes

None

# Additional Context

- requires backports

(cherry picked from commit 72a5c33e6a)

(cherry picked from commit 7520450e11)
2025-10-29 10:24:31 +01:00
Livio Spring
8c81a10f0a fix integration test
(partial cherry pick of 492f1826ee)
2025-09-16 14:28:08 +02:00
Livio Spring
690686a8ea chore(integration test): prevent eventual consistency issue in TestServer_Limits_AuditLogRetention (#10608)
# Which Problems Are Solved

The TestServer_Limits_AuditLogRetention is too reliant on time
constraints when checking that a limit is correctly applied. IN case it
takes to long to do all the preparation, there won't be any events to
read and the test will fail.

# How the Problems Are Solved

Don't require any events to be returned.

# Additional Changes

None

# Additional Context

- Noted a lot of pipeline to fail on this step.
- requires backport to at least v4.x

(cherry picked from commit 8574d6fbab)
2025-09-16 11:22:15 +02:00
Livio Spring
bc2a565da2 fix(projections): handle reduce error by updating failed events (#10726)
# Which Problems Are Solved

I noticed that a failure in the projections handlers `reduce` function
(e.g. creating the statement or checking preconditions for the
statement) would not update the `failed_events2` table.
This was due to a wrong error handling, where as long as the
`maxFailureCount` was not reached, the error was returned after updating
the `failed_events2` table, which causes the transaction to be rolled
back and thus losing the update.

# How the Problems Are Solved

Wrap the error into an `executionError`, so the transaction is not
rolled back.

# Additional Changes

none

# Additional Context

- noticed internally
- requires backport to v3.x and v4.x

(cherry picked from commit ee92560f32)
2025-09-16 08:44:12 +02:00
Silvan
358a0d93d0 fix(projection): prevent skipped events written within the same microsecond (#10710)
This PR fixes a bug where projections could skip events if they were
written within the same microsecond, which can occur during high load on
different transactions.

## Problem

The event query ordering was not fully deterministic. Events created at
the exact same time (same `position`) and in the same transaction
(`in_tx_order`) were not guaranteed to be returned in the same order on
subsequent queries. This could lead to some events being skipped by the
projection logic.

## Solution

To solve this, the `ORDER BY` clause for event queries has been extended
to include `instance_id`, `aggregate_type`, and `aggregate_id`. This
ensures a stable and deterministic ordering for all events, even if they
share the same timestamp.

## Additionally changes:

* Replaced a manual slice search with the more idiomatic
`slices.Contains` to skip already projected instances.
* Changed the handling of already locked projections to log a debug
message and skip execution instead of returning an error.
* Ensures the database transaction is explicitly committed.

(cherry picked from commit 25ab6b2397)
2025-09-15 09:52:29 +02:00
Silvan
47c87b3e55 fix(projections): overhaul the event projection system (#10560)
This PR overhauls our event projection system to make it more robust and
prevent skipped events under high load. The core change replaces our
custom, transaction-based locking with standard PostgreSQL advisory
locks. We also introduce a worker pool to manage concurrency and prevent
database connection exhaustion.

### Key Changes

* **Advisory Locks for Projections:** Replaces exclusive row locks and
inspection of `pg_stat_activity` with PostgreSQL advisory locks for
managing projection state. This is a more reliable and standard approach
to distributed locking.
* **Simplified Await Logic:** Removes the complex logic for awaiting
open transactions, simplifying it to a more straightforward time-based
filtering of events.
* **Projection Worker Pool:** Implements a worker pool to limit
concurrent projection triggers, preventing connection exhaustion and
improving stability under load. A new `MaxParallelTriggers`
configuration option is introduced.

### Problem Solved

Under high throughput, a race condition could cause projections to miss
events from the eventstore. This led to inconsistent data in projection
tables (e.g., a user grant might be missing). This PR fixes the
underlying locking and concurrency issues to ensure all events are
processed reliably.

### How it Works

1. **Event Writing:** When writing events, a *shared* advisory lock is
taken. This signals that a write is in progress.
2.  **Event Handling (Projections):**
* A projection worker attempts to acquire an *exclusive* advisory lock
for that specific projection. If the lock is already held, it means
another worker is on the job, so the current one backs off.
* Once the lock is acquired, the worker briefly acquires and releases
the same *shared* lock used by event writers. This acts as a barrier,
ensuring it waits for any in-flight writes to complete.
* Finally, it processes all events that occurred before its transaction
began.

### Additional Information

* ZITADEL no longer modifies the `application_name` PostgreSQL variable
during event writes.
*   The lock on the `current_states` table is now `FOR NO KEY UPDATE`.
*   Fixes https://github.com/zitadel/zitadel/issues/8509

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>

(cherry picked from commit 0575f67e94)
2025-09-15 09:50:26 +02:00
kkrime
4512aa52a6 fix(projections): added check to make sure there cannot be 2 projections for the same table (#10439)
# Which Problems Are Solved

It should not be possible to start 2 projections with the same name.

If this happens, it can cause issues with the event store such as events
being skipped/unprocessed and can be very hard/time-consuming to
diagnose.

# How the Problems Are Solved

A check was added to make sure no 2 projections have the same table

Closes https://github.com/zitadel/zitadel/issues/10453

---------

Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
(cherry picked from commit 10bd747105)
2025-09-15 09:49:51 +02:00
Livio Spring
6eab962313 fix(oidc): ignore invalid id_token_hints (#10682)
# Which Problems Are Solved

Invalid id_tokens used as `id_token_hint` on the authorization endpoints
currently return an error, resp. get display on the endpoint itself.

# How the Problems Are Solved

Ignore invalid id_token_hint errors and just log them.

# Additional Changes

None

# Additional Context

- closes https://github.com/zitadel/zitadel/issues/10673
- backport to v4.x

(cherry picked from commit e158f9447e)
2025-09-11 15:07:31 +02:00
Livio Spring
26ae44b3fd fix(actions v2): send event_payload on event executions again (#10651)
# Which Problems Are Solved

It was noticed that on actions v2 when subscribing to events, the
webhook would always receive an empty `event_payload`:
```
{
    "aggregateID": "336494809936035843",
    "aggregateType": "user",
    "resourceOwner": "336392597046099971",
    "instanceID": "336392597046034435",
    "version": "v2",
    "sequence": 1,
    "event_type": "user.human.added",
    "created_at": "2025-09-05T08:55:36.156333Z",
    "userID": "336392597046755331",
    "event_payload":
    {}
}
```

The problem was due to using `json.Marshal` on the `Event` interface,
where the underlying `BaseEvent` prevents the data to be marshalled:

131f70db34/internal/eventstore/event_base.go (L38)

# How the Problems Are Solved

The `Event`s `Unmarshal` function is used with a `json.RawMessage`.

# Additional Changes

none

# Additional Context

- closes https://github.com/zitadel/zitadel/issues/10650
- requires backport to v4.x

(cherry picked from commit 79809d0199)
2025-09-11 15:07:13 +02:00
Zach Hirschtritt
6d83c5b534 fix: correct river otel metrics units (#10425)
# Which Problems Are Solved

The
[otelriver](https://github.com/riverqueue/rivercontrib/tree/master/otelriver)
package uses default otel histogram buckets that are designed for
millisecond measurements. OTEL docs also suggest standardizing on using
seconds as the measurement unit. However, the default buckets from
opentelemetry-go are more or less useless when used with seconds as the
smallest measurement is 5 seconds and the largest is nearly 3 hours.
Example:
```
river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="ok",tag="[]",le="0"} 0
river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="ok",tag="[]",le="5"} 1144
river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="ok",tag="[]",le="10"} 1144
<...more buckets here...>
river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="ok",tag="[]",le="7500"} 1144
river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="ok",tag="[]",le="10000"} 1144
river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="ok",tag="[]",le="+Inf"} 1144
```

# How the Problems Are Solved

Change the default unit to "ms" from "s" as supported by the middleware
API:
https://riverqueue.com/docs/open-telemetry#list-of-middleware-options

# Additional Changes

None

# Additional Context

None

Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
(cherry picked from commit fcdc598320)
2025-09-11 15:05:55 +02:00
Livio Spring
5105cf63de fix: cleanup information in logs (#10634)
# Which Problems Are Solved

I noticed some outdated / misleading logs when starting zitadel:
- The `init-projections` were no longer in beta for a long time.
- The LRU auth request cache is disabled by default, which results in
the following message, which has caused confusion by customers:
```level=info msg="auth request cache disabled" error="must provide a positive size"```

# How the Problems Are Solved

- Removed the beta info
- Disable cache initialization if possible

# Additional Changes

None

# Additional Context

- noticed internally
- backport to v4.x

(cherry picked from commit a1ad87387d)
2025-09-11 15:03:27 +02:00
Livio Spring
f5c34a58a4 perf(actionsv2): execution target router (#10564)
# Which Problems Are Solved

The event execution system currently uses a projection handler that
subscribes to and processes all events for all instances. This creates a
high static cost because the system over-fetches event data, handling
many events that are not needed by most instances. This inefficiency is
also reflected in high "rows returned" metrics in the database.

# How the Problems Are Solved

Eliminate the use of a project handler. Instead, events for which
"execution targets" are defined, are directly pushed to the queue by the
eventstore. A Router is populated in the Instance object in the authz
middleware.

- By joining the execution targets to the instance, no additional
queries are needed anymore.
- As part of the instance object, execution targets are now cached as
well.
- Events are queued within the same transaction, giving transactional
guarantees on delivery.
- Uses the "insert many fast` variant of River. Multiple jobs are queued
in a single round-trip to the database.
- Fix compatibility with PostgreSQL 15

# Additional Changes

- The signing key was stored as plain-text in the river job payload in
the DB. This violated our [Secrets
Storage](https://zitadel.com/docs/concepts/architecture/secrets#secrets-storage)
principle. This change removed the field and only uses the encrypted
version of the signing key.
- Fixed the target ordering from descending to ascending.
- Some minor linter warnings on the use of `io.WriteString()`.

# Additional Context

- Introduced in https://github.com/zitadel/zitadel/pull/9249
- Closes https://github.com/zitadel/zitadel/issues/10553
- Closes https://github.com/zitadel/zitadel/issues/9832
- Closes https://github.com/zitadel/zitadel/issues/10372
- Closes https://github.com/zitadel/zitadel/issues/10492

---------

Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>

(cherry picked from commit a9ebc06c77)
2025-09-10 07:53:53 +02:00
Tim Möhlmann
f017649b68 perf(cache): use redis unlink for key deletion (#10658)
# Which Problems Are Solved

The usage of the Redis `DEL` command showed blocking and slowdowns
during load-tests.

# How the Problems Are Solved

Use [`UNLINK`](https://redis.io/docs/latest/commands/UNLINK/) instead.

# Additional Changes

- none

# Additional Context

- closes https://github.com/zitadel/zitadel/issues/8930

(cherry picked from commit a06ae2c835)
2025-09-10 07:53:53 +02:00
Silvan
6d9ba93833 chore(queue): use schema config instead of search_path and application_name to configure the database schema (#10075)
Removes manual schema and application name setup via raw SQL and
switches to using River’s built-in schema configuration.

# Which Problems Are Solved

River provides a configuration flag to set the schema of the queue.
Zitadel sets the schema through database statements which is not needed
anymore.

# How the Problems Are Solved

Set the schema in the river configuration and removed old code

(cherry picked from commit b5f97d64b0)
2025-09-10 07:53:53 +02:00
Tim Möhlmann
ca510c52dd fix(oidc): enable webkey feature by default (#10683)
# Which Problems Are Solved

When the webkey feature flag was not enabled before an upgrade to v4,
all JWT tokens became invalid.
This created a couple of issues:

- All users with JWT access tokens are logged-out
- Clients that are unable to refresh keys based on key ID break
- id_token_hint could no longer be validated.

# How the Problems Are Solved

Force-enable the webkey feature on the v3 version, so that the upgrade
path is cleaner. Sessions now have time to role-over to the new keys
before initiating the upgrade to v4.

# Additional Changes

- none

# Additional Context

- Related https://github.com/zitadel/zitadel/issues/10673

---------

Co-authored-by: Livio Spring <livio.a@gmail.com>
2025-09-10 07:53:29 +02:00
Tim Möhlmann
7083bcd6ad feat: generate webkeys setup step (#10105)
# Which Problems Are Solved

We are preparing to roll-out and stabilize webkeys in the next version
of Zitadel. Before removing legacy signing-key code, we must ensure all
existing instances have their webkeys generated.

# How the Problems Are Solved

Add a setup step which generate 2 webkeys for each existing instance
that didn't have webkeys yet.

# Additional Changes

Return an error from the config type-switch, when the type is unknown.

# Additional Context

- Part 1/2 of https://github.com/zitadel/zitadel/issues/10029
- Should be back-ported to v3

(cherry picked from commit fa9de9a0f1)
2025-08-21 09:14:17 +02:00
Livio Spring
9daec3dd5e fix(login): only allow previously authenticated users on select account page
# Which Problems Are Solved

User enumeration was possible on the select account page by passing any userID as part of the form POST. Existing users could be selected even if they never authenticated on the same user agent (browser).

# How the Problems Are Solved

A check for an existing session on the same user agent was added to the select user function, resp. only required for the account selection page, since in other cases there doesn't have to be an existing session and the user agent integrity is already checked.

# Additional Changes

None

# Additional Context

None

(cherry picked from commit 7abe759c95)
2025-08-21 09:04:24 +02:00
Livio Spring
8cb28190ee fix: correctly escape backslash in queries (#10522)
# Which Problems Are Solved

While investigating a support ticket, it was discovered that some
queries using equals or not equals without case matching were not
correctly escaping the value to compare. If a value contained a
backslash (`\`) the row would not match.

# How the Problems Are Solved

- Fixed the escaping for backslash for `like` operations.
- Changed equals and not equals comparison without case matching to `=`
instead of `like`.

# Additional Changes

None

# Additional Context

- related to a support request
- requires backport to v.3 and v4.x

(cherry picked from commit 6c8d027e72)
2025-08-21 08:32:24 +02:00
Copilot
982d7db174 perf(oidc): introspection endpoint query optimization (#10392)
The `/introspect` endpoint showed poor performance during v4 load
testing due to an inefficient database query in
`internal/query/introspection_client_by_id.sql`. This PR optimizes the
query structure to significantly improve performance.

## Query Optimizations

**UNION → UNION ALL**: Changed expensive `UNION` to `UNION ALL` since
`client_id` is unique across both API and OIDC config tables,
eliminating unnecessary deduplication overhead (30-50% improvement
expected).

**Simplified Keys CTE**: Optimized the keys lookup logic by using
`$2::text as client_id` instead of `identifier as client_id` with `group
by`, and added explicit `$3 = true` condition to the LEFT JOIN for
better query planning.

**Enhanced Readability**: Added consistent table aliases (c, a, p, o, k)
for better maintainability.

## Benefits

- **Zero-downtime deployment**: Uses existing database indexes, no
schema changes required
- **Secondary performance gains**: Other similar queries
(`oidc_client_by_id.sql`, `userinfo_client_by_id.sql`) will also benefit
from the optimizations
- **Minimal code changes**: Only 13 lines added, 9 lines removed in the
SQL query
- **Backward compatible**: Same result set and API behavior

The optimized query maintains the same functionality while providing
significant performance improvements for the introspection endpoint
under high concurrent load.

Fixes #10389.

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 Share your feedback on Copilot coding agent for the chance to win a
$200 gift card! Click
[here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to
start the survey.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: muhlemmer <5411563+muhlemmer@users.noreply.github.com>
Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
(cherry picked from commit a28950661c)
2025-08-21 08:32:20 +02:00
Livio Spring
4320ae9d9e fix(idp): make external id check case insensitive (#10460)
# Which Problems Are Solved

When searching for an existing external userID from an IdP response, the
comparison is case sensitive. This can lead to issues esp. when using
SAML, since the `NameID`'s value case could change. The existing user
would not be found and the login would try to create a new one, but fail
since the uniqueness check of IdP ID and external userID is not case
insensitive.

# How the Problems Are Solved

Search case insensitive for external useriDs.

# Additional Changes

None

# Additional Context

- closes #10457, #10387
- backport to v3.x

(cherry picked from commit 4630b53313)
2025-08-15 14:08:57 +02:00
Zach Hirschtritt
fc8e15ad76 fix: drop default otel scope info from metrics (#10306)
# Which Problems Are Solved

Currently, the prometheus endpoint metrics contain otel specific labels
that increase the overall metric size to the point that the exemplar
implementation in the underlying prom exporter library throws an error,
see https://github.com/zitadel/zitadel/issues/10047. The MaxRuneSize for
metric refs in exemplars is 128 and many of metrics cross this because
of `otel_scope_name`.

# How the Problems Are Solved

This change drops those otel specific labels on the prometheus exporter:
`otel_scope_name` and `otel_scope_version`

Current metrics example: 
```
http_server_duration_milliseconds_bucket{http_method="GET",http_status_code="200",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.53.0",le="0"} 0
http_server_duration_milliseconds_bucket{http_method="GET",http_status_code="200",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.53.0",le="5"} 100
http_server_duration_milliseconds_bucket{http_method="GET",http_status_code="200",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.53.0",le="10"} 100
...
grpc_server_grpc_status_code_total{grpc_method="/zitadel.admin.v1.AdminService/ListIAMMemberRoles",otel_scope_name="",otel_scope_version="",return_code="200"} 3
grpc_server_grpc_status_code_total{grpc_method="/zitadel.admin.v1.AdminService/ListIAMMembers",otel_scope_name="",otel_scope_version="",return_code="200"} 3
grpc_server_grpc_status_code_total{grpc_method="/zitadel.admin.v1.AdminService/ListMilestones",otel_scope_name="",otel_scope_version="",return_code="200"} 1
```

New example: 
```
http_server_duration_milliseconds_bucket{http_method="GET",http_status_code="200",le="10"} 8
http_server_duration_milliseconds_bucket{http_method="GET",http_status_code="200",le="25"} 8
http_server_duration_milliseconds_bucket{http_method="GET",http_status_code="200",le="50"} 9
http_server_duration_milliseconds_bucket{http_method="GET",http_status_code="200",le="75"} 9
...
grpc_server_grpc_status_code_total{grpc_method="/zitadel.admin.v1.AdminService/GetSupportedLanguages",return_code="200"} 1
grpc_server_grpc_status_code_total{grpc_method="/zitadel.admin.v1.AdminService/ListMilestones",return_code="200"} 1
grpc_server_grpc_status_code_total{grpc_method="/zitadel.auth.v1.AuthService/GetMyLabelPolicy",return_code="200"} 3
```

# Additional Changes

None

# Additional Context

From my understanding, this change is fully spec compliant with
Prometheus and Otel:
*
https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/#instrumentation-scope

However, these tags were originally added as optional labels to
disambiguate metrics. But I'm not sure we need to care about that right
now? My gut feeling is that exemplar support (the ability for traces to
reference metrics) would be a preferable tradeoff to this label
standard.

Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
2025-08-15 10:32:42 +02:00
Livio Spring
2ede875d41 fix merge 2025-08-12 14:53:21 +02:00
Silvan
6602a9e6c3 fix: query organization directly from event store (#10463)
Querying an organization by id allowed to trigger the org projection.
This could lead to performance impacts if the projection gets triggered
too often.

Instead of executing the trigger the organization by id query is now
always executed on the eventstore and reduces all event types required
of the organization requested.

---------

Co-authored-by: Livio Spring <livio.a@gmail.com>
2025-08-12 11:46:15 +02:00
Silvan
2f79a01e86 fix(loginV1): disable org metadata triggers (#10454)
disables trigger of org metadata projection in actions v1 when using `api.v1.getOrgMetadata()`
2025-08-12 10:53:16 +02:00
Stefan Benz
c22b5aa8e8 fix: user metadata check if already existing (#10430)
# Which Problems Are Solved

If metadata is set, there is no check if it even has to be changed.

# How the Problems Are Solved

Check if metadata already exists, and push no event if nothing changed.

# Additional Changes

Original changes under #10246 amendet for v3.3.x, removed permission
check
Fixes #10434

# Additional Context

none

---------

Co-authored-by: Gayathri Vijayan <66356931+grvijayan@users.noreply.github.com>
Co-authored-by: Marco A. <marco@zitadel.com>
Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
2025-08-11 14:38:32 +02:00
Livio Spring
0c5dbe1d2f fix merge 2025-08-08 16:07:55 +02:00
Livio Spring
c4fb0d2e62 fix: allow disabling projections for specific instances (#10421)
# Which Problems Are Solved

The current handling of event subscriptions for actions is bad, esp. on
instances with a lot of events
(https://github.com/zitadel/zitadel/issues/9832#issuecomment-2866236414).
This led to severe problems on zitadel.cloud for such instances.

# How the Problems Are Solved

As a workaround until the handling can be improved, we introduce an
option for projections to be disabled completely for specific instances:
`SkipInstanceIDs`

# Additional Changes

None

# Additional Context

- relates to https://github.com/zitadel/zitadel/issues/9832

(cherry picked from commit 67efddcbc6)
2025-08-08 15:44:34 +02:00
Stefan Benz
218556e0e1 fix: remove fields entry with instance domain remove (#10406)
# Which Problems Are Solved

Fields table entry is not removed when removing instance domain.

# How the Problems Are Solved

Remove the fields entry, instead of setting it.

# Additional Changes

None

# Additional Context

Needs to be backported to v3.x
2025-08-08 11:00:38 +02:00
Gayathri Vijayan
f13380954f fix(sessions): add an expiration date filter to list sessions api (#10384)
# Which Problems Are Solved

The deletion of expired sessions does not go through even though a
success response is returned to the user. These expired and supposedly
deleted (to the user) sessions are then returned when the `ListSessions`
API is called.

This PR fixes this issue by:
1. Allowing deletion of expired sessions
2. Providing an `expiration_date` filter in `ListSession` API to filter
sessions by expiration date

# How the Problems Are Solved

1. Remove expired session check during deletion
2. Add an `expiration_date` filter to the  `ListSession` API

# Additional Changes
N/A

# Additional Context
- Closes #10045

---------

Co-authored-by: Marco A. <marco@zitadel.com>
2025-08-08 11:00:26 +02:00
Zach Hirschtritt
850bd8a813 fix: don't trigger session projection on notification handling (#10298)
# Which Problems Are Solved

There is an outstanding bug wherein a session projection can fail to
complete and an session OTP challenge is blocked because the projection
doesn't exist. Not sure why the session projection can fail to persist -
I can't find any error logs or failed events to crosscheck. However, I
can clearly see the session events persisted with user/password checks
and the OTP challenged added on the session - but no session projection
on sessions8 table.

This only seems to come up under somewhat higher loads - about 5
logins/s and only for about 1% of cases. (where a "login" is:
authRequest, createSession, getAuthCodeWithSession, tokenExchange, and
finally, otpSmsChallenge...💥).

# How the Problems Are Solved

This is only half a fix, but an important one as it can block login for
affected users. Instead of triggering and checking the session
projection on notification enqueuing, build a write model directly from
the ES.

# Additional Changes

# Additional Context

This doesn't touch the "legacy" notification handler as to limit the
blast radius of this change. But might be worth adding there too.

The test is difficult to update correctly so is somewhat incomplete. Any
suggestions for refactoring or test helpers I'm missing would be
welcome.

Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
2025-08-08 10:59:58 +02:00
Gayathri Vijayan
0bbca7fde2 fix(saml): use transient mapping attribute when nameID is missing in saml response (#10353)
# Which Problems Are Solved

In the SAML responses from some IDPs (e.g. ADFS and Shibboleth), the
`<NameID>` part could be missing in `<Subject>`, and in some cases, the
`<Subject>` part might be missing as well. This causes Zitadel to fail
the SAML login with the following error message:

```
ID=SAML-EFG32 Message=Errors.Intent.ResponseInvalid
```

# How the Problems Are Solved

This is solved by adding a workaround to accept a transient mapping
attribute when the `NameID` or the `Subject` is missing in the SAML
response. This requires setting the custom transient mapping attribute
in the SAML IDP config in Zitadel, and it should be present in the SAML
response as well.

<img width="639" height="173" alt="image"
src="https://github.com/user-attachments/assets/cbb792f1-aa6c-4b16-ad31-bd126d164eae"
/>


# Additional Changes
N/A

# Additional Context
- Closes #10251
2025-08-08 10:59:05 +02:00
Livio Spring
b76d8d37cb fix: permission checks on session API
# Which Problems Are Solved

The session API allowed any authenticated user to update sessions by their ID without any further check.
This was unintentionally introduced with version 2.53.0 when the requirement of providing the latest session token on every session update was removed and no other permission check (e.g. session.write) was ensured.

# How the Problems Are Solved

- Granted `session.write` to `IAM_OWNER` and `IAM_LOGIN_CLIENT` in the defaults.yaml
- Granted `session.read` to `IAM_ORG_MANAGER`, `IAM_USER_MANAGER` and `ORG_OWNER` in the defaults.yaml
- Pass the session token to the UpdateSession command.
- Check for `session.write` permission on session creation and update.
  - Alternatively, the (latest) sessionToken can be used to update the session.
- Setting an auth request to failed on the OIDC Service `CreateCallback` endpoint now ensures it's either the same user as used to create the auth request (for backwards compatibilty) or requires `session.link` permission.
- Setting an device auth request to failed on the OIDC Service `AuthorizeOrDenyDeviceAuthorization` endpoint now requires `session.link` permission.
- Setting an auth request to failed on the SAML Service `CreateResponse` endpoint now requires `session.link` permission.

# Additional Changes

none

# Additional Context

none

(cherry picked from commit 4c942f3477)
2025-07-15 13:47:35 +02:00
Livio Spring
c787cdf7b4 fix(login v1): correctly auto-link users on organizations with suffixed usernames (#10205)
(cherry picked from commit 8f61b24532)
2025-07-11 13:26:47 +02:00
Livio Spring
42663a29bd perf: improve org and org domain creation (#10232)
# Which Problems Are Solved

When an organization domain is verified, e.g. also when creating a new
organization (incl. generated domain), existing usernames are checked if
the domain has been claimed.
The query was not optimized for instances with many users and
organizations.

# How the Problems Are Solved

- Replace the query, which was searching over the users projection with
(computed loginnames) with a dedicated query checking the loginnames
projection directly.
-  All occurrences have been updated to use the new query.

# Additional Changes

None

# Additional Context

- reported through support
- requires backport to v3.x

(cherry picked from commit fefeaea56a)
2025-07-11 08:55:24 +02:00
Livio Spring
d537e86345 fix(login v1): handle password reset when authenticating with email or phone number (#10228)
# Which Problems Are Solved

When authenticating with email or phone number in the login V1, users
were not able to request a password reset and would be given a "User not
found" error.
This was due to a check of the loginname of the auth request, which in
those cases would not match the user's stored loginname.

# How the Problems Are Solved

Switch to a check of the resolved userID in the auth request. (We still
check the user again, since the ID might be a placeholder for an unknown
user and we do not want to disclose any information by omitting a check
and reduce the response time.)

# Additional Changes

None

# Additional Context

- reported through support
- requires backport to v3.x

(cherry picked from commit ffe6d41588)
2025-07-11 08:04:46 +02:00
Livio Spring
b638ed528d fix(login v1): ensure the user's organization is always set into the token context (#10221)
# Which Problems Are Solved

Customers reported, that if the session / access token in Console
expired and they re-authenticated, the user list would be empty.
While reproducing the issue, we discovered that the necessary
organization information, would be missing in the access token, since
this would already be missing in the OIDC session creation when using an
id_token_hint.

# How the Problems Are Solved

- Ensure the user's organization is set in the login v1 auth request.
This is used to create the OIDC and token information.

# Additional Changes

None

# Additional Context

- reported by customers
- requires backport to v3.x

(cherry picked from commit 2821f41c3a)
2025-07-11 08:04:36 +02:00
Livio Spring
1d409f7959 fix(webauthn): allow to use "old" passkeys/u2f credentials on session API (#10150)
# Which Problems Are Solved

To prevent presenting unusable WebAuthN credentials to the user /
browser, we filtered out all credentials, which do not match the
requested RP ID. Since credentials set up through Login V1 and Console
do not have an RP ID stored, they never matched. This was previously
intended, since the Login V2 could be served on a separate domain.
The problem is, that if it is hosted on the same domain, the credentials
would also be filtered out and user would not be able to login.

# How the Problems Are Solved

Change the filtering to return credentials, if no RP ID is stored and
the requested RP ID matches the instance domain.

# Additional Changes

None

# Additional Context

Noted internally when testing the login v2

(cherry picked from commit 71575e8d67)
2025-07-11 08:01:06 +02:00
Trong Huu Nguyen
c2c49679cb fix(scim): add type attribute to ScimEmail (#9690)
# Which Problems Are Solved

- SCIM PATCH operations for users from Entra ID for the `emails`
attribute fails due to missing `type` subattribute

# How the Problems Are Solved

- Adds the `type` attribute to the `ScimUser` struct and sets the
default value to `"work"` in the `mapWriteModelToScimUser()` method.

# Additional Changes

# Additional Context

The SCIM handlers for POST and PUT ignore multiple emails and only uses
the primary email for a given user, or falls back to the first email if
none are marked as primary. PATCH operations however, will attempt to
resolve the provided filter in `operations[].path`.

Some services, such as Entra ID, only support patching emails by
filtering for `emails[type eq "(work|home|other)"].value`, which fails
with Zitadel as the ScimUser struct (and thus the generated schema)
doesn't include the `type` field.

This commit adds the `type` field to work around this issue, while still
preserving compatibility with filters such as `emails[primary eq
true].value`.

-
https://discord.com/channels/927474939156643850/927866013545025566/1356556668527448191

---------

Co-authored-by: Christer Edvartsen <christer.edvartsen@nav.no>
Co-authored-by: Thomas Siegfried Krampl <thomas.siegfried.krampl@nav.no>
(cherry picked from commit 3a4298c179)
2025-07-11 07:59:29 +02:00
Abhinav Sethi
056399bdb4 fix: enable opentelemetry metrics for river queue (#10044)
# Which Problems Are Solved

Right now we have no visibility into river queue's job processing times
and queue sizes. This makes it difficult to reliably know if
notifications are actually being published in a reasonable time and
current queue size.

# How the Problems Are Solved
Integrates River's OpenTelemetry middleware with Zitadel's metrics
system by adding the otelriver middleware to the queue configuration.

# Additional Changes
- Updated dependencies to include required `otelriver` package

# Additional Context

Example output from `/debug/metrics`

<details>
  <summary>output</summary>

# HELP failed_deliveries_json_total Failed JSON message deliveries
# TYPE failed_deliveries_json_total counter

failed_deliveries_json_total{otel_scope_name="",otel_scope_version="",triggering_event_type="user.human.phone.code.added"}
2
# HELP go_gc_duration_seconds A summary of the wall-time pause
(stop-the-world) duration in garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 3.8e-05
go_gc_duration_seconds{quantile="0.25"} 6.3916e-05
go_gc_duration_seconds{quantile="0.5"} 7.5584e-05
go_gc_duration_seconds{quantile="0.75"} 9.2584e-05
go_gc_duration_seconds{quantile="1"} 0.000204292
go_gc_duration_seconds_sum 0.003028502
go_gc_duration_seconds_count 34
# HELP go_gc_gogc_percent Heap size target percentage configured by the
user, otherwise 100. This value is set by the GOGC environment variable,
and the runtime/debug.SetGCPercent function. Sourced from
/gc/gogc:percent
# TYPE go_gc_gogc_percent gauge
go_gc_gogc_percent 100
# HELP go_gc_gomemlimit_bytes Go runtime memory limit configured by the
user, otherwise math.MaxInt64. This value is set by the GOMEMLIMIT
environment variable, and the runtime/debug.SetMemoryLimit function.
Sourced from /gc/gomemlimit:bytes
# TYPE go_gc_gomemlimit_bytes gauge
go_gc_gomemlimit_bytes 9.223372036854776e+18
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 231
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.24.3"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated in heap and
currently in use. Equals to /memory/classes/heap/objects:bytes.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 7.7565832e+07
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated in
heap until now, even if released already. Equals to
/gc/heap/allocs:bytes.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 7.3319844e+08
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the
profiling bucket hash table. Equals to
/memory/classes/profiling/buckets:bytes.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.63816e+06
# HELP go_memstats_frees_total Total number of heap objects frees.
Equals to /gc/heap/frees:objects + /gc/heap/tiny/allocs:objects.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 1.1496925e+07
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage
collection system metadata. Equals to
/memory/classes/metadata/other:bytes.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 5.182776e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and
currently in use, same as go_memstats_alloc_bytes. Equals to
/memory/classes/heap/objects:bytes.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 7.7565832e+07
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be
used. Equals to /memory/classes/heap/released:bytes +
/memory/classes/heap/free:bytes.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 5.8179584e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in
use. Equals to /memory/classes/heap/objects:bytes +
/memory/classes/heap/unused:bytes
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 8.5868544e+07
# HELP go_memstats_heap_objects Number of currently allocated objects.
Equals to /gc/heap/objects:objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 573723
# HELP go_memstats_heap_released_bytes Number of heap bytes released to
OS. Equals to /memory/classes/heap/released:bytes.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 7.20896e+06
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from
system. Equals to /memory/classes/heap/objects:bytes +
/memory/classes/heap/unused:bytes + /memory/classes/heap/released:bytes
+ /memory/classes/heap/free:bytes.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 1.44048128e+08
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of
last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.749491558214289e+09
# HELP go_memstats_mallocs_total Total number of heap objects allocated,
both live and gc-ed. Semantically a counter version for
go_memstats_heap_objects gauge. Equals to /gc/heap/allocs:objects +
/gc/heap/tiny/allocs:objects.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 1.2070648e+07
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache
structures. Equals to /memory/classes/metadata/mcache/inuse:bytes.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 16912
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache
structures obtained from system. Equals to
/memory/classes/metadata/mcache/inuse:bytes +
/memory/classes/metadata/mcache/free:bytes.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 31408
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan
structures. Equals to /memory/classes/metadata/mspan/inuse:bytes.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 1.3496e+06
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan
structures obtained from system. Equals to
/memory/classes/metadata/mspan/inuse:bytes +
/memory/classes/metadata/mspan/free:bytes.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 2.18688e+06
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage
collection will take place. Equals to /gc/heap/goal:bytes.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 1.34730994e+08
# HELP go_memstats_other_sys_bytes Number of bytes used for other system
allocations. Equals to /memory/classes/other:bytes.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 3.125168e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes obtained from
system for stack allocator in non-CGO environments. Equals to
/memory/classes/heap/stacks:bytes.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 2.752512e+06
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system
for stack allocator. Equals to /memory/classes/heap/stacks:bytes +
/memory/classes/os-stacks:bytes.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 2.752512e+06
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
Equals to /memory/classes/total:byte.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 1.58965032e+08
# HELP go_sched_gomaxprocs_threads The current runtime.GOMAXPROCS
setting, or the number of operating system threads that can execute
user-level Go code simultaneously. Sourced from
/sched/gomaxprocs:threads
# TYPE go_sched_gomaxprocs_threads gauge
go_sched_gomaxprocs_threads 14
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 25
# HELP grpc_server_grpc_status_code_total Grpc status code counter
# TYPE grpc_server_grpc_status_code_total counter

grpc_server_grpc_status_code_total{grpc_method="/zitadel.management.v1.ManagementService/ListUserChanges",otel_scope_name="",otel_scope_version="",return_code="200"}
1

grpc_server_grpc_status_code_total{grpc_method="/zitadel.management.v1.ManagementService/ListUserMetadata",otel_scope_name="",otel_scope_version="",return_code="200"}
2

grpc_server_grpc_status_code_total{grpc_method="/zitadel.management.v1.ManagementService/ResendHumanPhoneVerification",otel_scope_name="",otel_scope_version="",return_code="200"}
1

grpc_server_grpc_status_code_total{grpc_method="/zitadel.user.v2.UserService/GetUserByID",otel_scope_name="",otel_scope_version="",return_code="200"}
1
# HELP grpc_server_request_counter_total Grpc request counter
# TYPE grpc_server_request_counter_total counter

grpc_server_request_counter_total{grpc_method="/zitadel.management.v1.ManagementService/ListUserChanges",otel_scope_name="",otel_scope_version=""}
1

grpc_server_request_counter_total{grpc_method="/zitadel.management.v1.ManagementService/ListUserMetadata",otel_scope_name="",otel_scope_version=""}
2

grpc_server_request_counter_total{grpc_method="/zitadel.management.v1.ManagementService/ResendHumanPhoneVerification",otel_scope_name="",otel_scope_version=""}
1

grpc_server_request_counter_total{grpc_method="/zitadel.user.v2.UserService/GetUserByID",otel_scope_name="",otel_scope_version=""}
1
# HELP grpc_server_total_request_counter_total Total grpc request
counter
# TYPE grpc_server_total_request_counter_total counter

grpc_server_total_request_counter_total{otel_scope_name="",otel_scope_version=""}
5
# HELP otel_scope_info Instrumentation Scope metadata
# TYPE otel_scope_info gauge
otel_scope_info{otel_scope_name="",otel_scope_version=""} 1

otel_scope_info{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version=""}
1
# HELP projection_events_processed_total Number of events reduced to
process projection updates
# TYPE projection_events_processed_total counter

projection_events_processed_total{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",success="true"}
1

projection_events_processed_total{otel_scope_name="",otel_scope_version="",projection="projections.instance_features2",success="true"}
0

projection_events_processed_total{otel_scope_name="",otel_scope_version="",projection="projections.login_names3",success="true"}
0

projection_events_processed_total{otel_scope_name="",otel_scope_version="",projection="projections.notifications",success="true"}
1

projection_events_processed_total{otel_scope_name="",otel_scope_version="",projection="projections.orgs1",success="true"}
0

projection_events_processed_total{otel_scope_name="",otel_scope_version="",projection="projections.user_metadata5",success="true"}
0

projection_events_processed_total{otel_scope_name="",otel_scope_version="",projection="projections.users14",success="true"}
0
# HELP projection_handle_timer_seconds Time taken to process a
projection update
# TYPE projection_handle_timer_seconds histogram

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="0.005"}
0

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="0.01"}
1

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="0.05"}
1

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="0.1"}
1

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="1"}
1

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="5"}
1

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="10"}
1

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="30"}
1

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="60"}
1

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="120"}
1

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="+Inf"}
1

projection_handle_timer_seconds_sum{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler"}
0.007344541

projection_handle_timer_seconds_count{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler"}
1

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="0.005"}
0

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="0.01"}
0

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="0.05"}
1

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="0.1"}
1

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="1"}
1

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="5"}
1

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="10"}
1

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="30"}
1

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="60"}
1

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="120"}
1

projection_handle_timer_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="+Inf"}
1

projection_handle_timer_seconds_sum{otel_scope_name="",otel_scope_version="",projection="projections.notifications"}
0.014258458

projection_handle_timer_seconds_count{otel_scope_name="",otel_scope_version="",projection="projections.notifications"}
1
# HELP projection_state_latency_seconds When finishing processing a
batch of events, this track the age of the last events seen from current
time
# TYPE projection_state_latency_seconds histogram

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="0.1"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="0.5"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="1"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="5"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="10"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="30"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="60"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="300"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="600"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="1800"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler",le="+Inf"}
1

projection_state_latency_seconds_sum{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler"}
0.012979

projection_state_latency_seconds_count{otel_scope_name="",otel_scope_version="",projection="projections.execution_handler"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="0.1"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="0.5"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="1"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="5"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="10"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="30"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="60"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="300"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="600"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="1800"}
1

projection_state_latency_seconds_bucket{otel_scope_name="",otel_scope_version="",projection="projections.notifications",le="+Inf"}
1

projection_state_latency_seconds_sum{otel_scope_name="",otel_scope_version="",projection="projections.notifications"}
0.0199

projection_state_latency_seconds_count{otel_scope_name="",otel_scope_version="",projection="projections.notifications"}
1
# HELP promhttp_metric_handler_requests_in_flight Current number of
scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by
HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 1
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
# HELP river_insert_count_total Number of jobs inserted
# TYPE river_insert_count_total counter

river_insert_count_total{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok"}
1
# HELP river_insert_many_count_total Number of job batches inserted (all
jobs are inserted in a batch, but batches may be one job)
# TYPE river_insert_many_count_total counter

river_insert_many_count_total{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok"}
1
# HELP river_insert_many_duration_histogram_seconds Duration of job
batch insertion (histogram)
# TYPE river_insert_many_duration_histogram_seconds histogram

river_insert_many_duration_histogram_seconds_bucket{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok",le="0"}
0

river_insert_many_duration_histogram_seconds_bucket{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok",le="5"}
1

river_insert_many_duration_histogram_seconds_bucket{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok",le="10"}
1

river_insert_many_duration_histogram_seconds_bucket{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok",le="25"}
1

river_insert_many_duration_histogram_seconds_bucket{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok",le="50"}
1

river_insert_many_duration_histogram_seconds_bucket{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok",le="75"}
1

river_insert_many_duration_histogram_seconds_bucket{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok",le="100"}
1

river_insert_many_duration_histogram_seconds_bucket{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok",le="250"}
1

river_insert_many_duration_histogram_seconds_bucket{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok",le="500"}
1

river_insert_many_duration_histogram_seconds_bucket{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok",le="750"}
1

river_insert_many_duration_histogram_seconds_bucket{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok",le="1000"}
1

river_insert_many_duration_histogram_seconds_bucket{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok",le="2500"}
1

river_insert_many_duration_histogram_seconds_bucket{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok",le="5000"}
1

river_insert_many_duration_histogram_seconds_bucket{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok",le="7500"}
1

river_insert_many_duration_histogram_seconds_bucket{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok",le="10000"}
1

river_insert_many_duration_histogram_seconds_bucket{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok",le="+Inf"}
1

river_insert_many_duration_histogram_seconds_sum{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok"}
0.002905666

river_insert_many_duration_histogram_seconds_count{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok"}
1
# HELP river_insert_many_duration_seconds Duration of job batch
insertion
# TYPE river_insert_many_duration_seconds gauge

river_insert_many_duration_seconds{otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",status="ok"}
0.002905666
# HELP river_work_count_total Number of jobs worked
# TYPE river_work_count_total counter

river_work_count_total{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]"}
1

river_work_count_total{attempt="2",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]"}
1
# HELP river_work_duration_histogram_seconds Duration of job being
worked (histogram)
# TYPE river_work_duration_histogram_seconds histogram

river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="0"}
0

river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="5"}
1

river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="10"}
1

river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="25"}
1

river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="50"}
1

river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="75"}
1

river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="100"}
1

river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="250"}
1

river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="500"}
1

river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="750"}
1

river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="1000"}
1

river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="2500"}
1

river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="5000"}
1

river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="7500"}
1

river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="10000"}
1

river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="+Inf"}
1

river_work_duration_histogram_seconds_sum{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]"}
0.029241083

river_work_duration_histogram_seconds_count{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]"}
1

river_work_duration_histogram_seconds_bucket{attempt="2",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="0"}
0

river_work_duration_histogram_seconds_bucket{attempt="2",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="5"}
1

river_work_duration_histogram_seconds_bucket{attempt="2",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="10"}
1

river_work_duration_histogram_seconds_bucket{attempt="2",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="25"}
1

river_work_duration_histogram_seconds_bucket{attempt="2",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="50"}
1

river_work_duration_histogram_seconds_bucket{attempt="2",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="75"}
1

river_work_duration_histogram_seconds_bucket{attempt="2",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="100"}
1

river_work_duration_histogram_seconds_bucket{attempt="2",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="250"}
1

river_work_duration_histogram_seconds_bucket{attempt="2",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="500"}
1

river_work_duration_histogram_seconds_bucket{attempt="2",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="750"}
1

river_work_duration_histogram_seconds_bucket{attempt="2",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="1000"}
1

river_work_duration_histogram_seconds_bucket{attempt="2",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="2500"}
1

river_work_duration_histogram_seconds_bucket{attempt="2",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="5000"}
1

river_work_duration_histogram_seconds_bucket{attempt="2",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="7500"}
1

river_work_duration_histogram_seconds_bucket{attempt="2",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="10000"}
1

river_work_duration_histogram_seconds_bucket{attempt="2",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]",le="+Inf"}
1

river_work_duration_histogram_seconds_sum{attempt="2",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]"}
0.0408745

river_work_duration_histogram_seconds_count{attempt="2",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]"}
1
# HELP river_work_duration_seconds Duration of job being worked
# TYPE river_work_duration_seconds gauge

river_work_duration_seconds{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]"}
0.029241083

river_work_duration_seconds{attempt="2",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="error",tag="[]"}
0.0408745
# HELP target_info Target metadata
# TYPE target_info gauge

target_info{service_name="ZITADEL",service_version="2025-06-09T13:52:29-04:00",telemetry_sdk_language="go",telemetry_sdk_name="opentelemetry",telemetry_sdk_version="1.35.0"}
1

</details>

Example grafana dashboard:
![Screenshot 2025-06-11 at 11 30
06 AM](https://github.com/user-attachments/assets/a2c9b377-8ddd-40b9-a506-7df3b31941da)

- Closes #10043

---------

Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
(cherry picked from commit 83839fc2ef)
2025-07-11 07:59:08 +02:00
Livio Spring
3022ca9e76 feat: JWT IdP intent (#9966)
# Which Problems Are Solved

The login v1 allowed to use JWTs as IdP using the JWT IDP. The login V2
uses idp intents for such cases, which were not yet able to handle JWT
IdPs.

# How the Problems Are Solved

- Added handling of JWT IdPs in `StartIdPIntent` and `RetrieveIdPIntent`
- The redirect returned by the start, uses the existing `authRequestID`
and `userAgentID` parameter names for compatibility reasons.
- Added `/idps/jwt` endpoint to handle the proxied (callback) endpoint ,
which extracts and validates the JWT against the configured endpoint.

# Additional Changes

None

# Additional Context

- closes #9758

(cherry picked from commit 4d66a786c8)
2025-06-12 07:08:02 +02:00
Silvan
8ac4b61ee6 perf(query): reduce user query duration (#10037)
# Which Problems Are Solved

The resource usage to query user(s) on the database was high and
therefore could have performance impact.

# How the Problems Are Solved

Database queries involving the users and loginnames table were improved
and an index was added for user by email query.

# Additional Changes

- spellchecks
- updated apis on load tests

# additional info

needs cherry pick to v3

(cherry picked from commit 4df138286b)
2025-06-12 07:06:45 +02:00
Livio Spring
8b04ddf0e2 fix(cache): prevent org cache overwrite by other instances (#10012)
# Which Problems Are Solved

A customer reported that randomly certain login flows, such as automatic
redirect to the only configured IdP would not work. During the
investigation it was discovered that they used that same primary domain
on two different instances. As they used the domain for preselecting the
organization, one would always overwrite the other in the cache. Since
The organization and especially it's policies could not be retrieved on
the other instance, it would fallback to the default organization
settings, where the external login and the corresponding IdP were not
configured.

# How the Problems Are Solved

Include the instance id in the cache key for organizations to prevent
overwrites.

# Additional Changes

None

# Additional Context

- found because of a support request
- requires backport to 2.70.x, 2.71.x and 3.x

(cherry picked from commit 15902f5bc7)
2025-06-03 14:56:55 +02:00
Iraq
dedb923f43 fix(settings): fix for setting restricted languages (#9947)
# Which Problems Are Solved

Zitadel encounters a migration error when setting `restricted languages`
and fails to start.

# How the Problems Are Solved

The problem is that there is a check that checks that at least one of
the restricted languages is the same as the `default language`, however,
in the `authz instance` (where the default language is pulled form) is
never set.

I've added code to set the `default language` in the `authz instance`

# Additional Context

- Closes https://github.com/zitadel/zitadel/issues/9787

---------

Co-authored-by: Livio Spring <livio.a@gmail.com>
(cherry picked from commit b46c41e4bf)
2025-06-03 14:56:49 +02:00