Commit Graph

1976 Commits

Author SHA1 Message Date
Silvan
4900ac477a fix(test): increase retry tick duration in tests (#10787)
Adjust the retry tick duration in various tests to one minute to improve
reliability and reduce the frequency of retries.
2025-09-25 08:47:29 +02:00
Iraq
08f41e034e feat(idp_table_relational): adding inital idp tables for relational repository (#10334) 2025-09-24 13:19:09 +00:00
Silvan
cccfc816f6 refactor: database interaction and error handling (#10762)
This pull request introduces a significant refactoring of the database
interaction layer, focusing on improving explicitness, transactional
control, and error handling. The core change is the removal of the
stateful `QueryExecutor` from repository instances. Instead, it is now
passed as an argument to each method that interacts with the database.

This change makes transaction management more explicit and flexible, as
the same repository instance can be used with a database pool or a
specific transaction without needing to be re-instantiated.

### Key Changes

- **Explicit `QueryExecutor` Passing:**
- All repository methods (`Get`, `List`, `Create`, `Update`, `Delete`,
etc.) in `InstanceRepository`, `OrganizationRepository`,
`UserRepository`, and their sub-repositories now require a
`database.QueryExecutor` (e.g., a `*pgxpool.Pool` or `pgx.Tx`) as the
first argument.
- Repository constructors no longer accept a `QueryExecutor`. For
example, `repository.InstanceRepository(pool)` is now
`repository.InstanceRepository()`.

- **Enhanced Error Handling:**
- A new `database.MissingConditionError` is introduced to enforce
required query conditions, such as ensuring an `instance_id` is always
present in `UPDATE` and `DELETE` operations.
- The database error wrapper in the `postgres` package now correctly
identifies and wraps `pgx.ErrTooManyRows` and similar errors from the
`scany` library into a `database.MultipleRowsFoundError`.

- **Improved Database Conditions:**
- The `database.Condition` interface now includes a
`ContainsColumn(Column) bool` method. This allows for runtime checks to
ensure that critical filters (like `instance_id`) are included in a
query, preventing accidental cross-tenant data modification.
- A new `database.Exists()` condition has been added to support `EXISTS`
subqueries, enabling more complex filtering logic, such as finding an
organization that has a specific domain.

- **Repository and Interface Refactoring:**
- The method for loading related entities (e.g., domains for an
organization) has been changed from a boolean flag (`Domains(true)`) to
a more explicit, chainable method (`LoadDomains()`). This returns a new
repository instance configured to load the sub-resource, promoting
immutability.
- The custom `OrgIdentifierCondition` has been removed in favor of using
the standard `database.Condition` interface, simplifying the API.

- **Code Cleanup and Test Updates:**
  - Unnecessary struct embeddings and metadata have been removed.
- All integration and repository tests have been updated to reflect the
new method signatures, passing the database pool or transaction object
explicitly.
- New tests have been added to cover the new `ExistsDomain`
functionality and other enhancements.

These changes make the data access layer more robust, predictable, and
easier to work with, especially in the context of database transactions.
2025-09-24 10:12:31 +00:00
Livio Spring
16906d2c2c fix(export): add sorting when searching users to prevent error (#10777)
# Which Problems Are Solved

When exporting users, an error `QUERY-AG4gs` was returned. This was due
to #10750, where the orderBy column was added to the query to prevent
the exact same error. In case there was no sorting column specified,
such as the export, the query would fail.

# How the Problems Are Solved

- Added a default sorting on `id` as we already have for the other APIs.

# Additional Changes

None

# Additional Context

- reported through support
- relates to #10750, #10415
- backport to v4.x
2025-09-23 11:39:05 +02:00
masum-msphere
295584648d feat(oidc): Added new claim in userinfo response to return all requested audience roles (#9861)
# Which Problems Are Solved

The /userinfo endpoint only returns roles for the current project, even
if the access token includes multiple project aud scopes.

This prevents clients from retrieving all user roles across multiple
projects, making multi-project access control ineffective.

# How the Problems Are Solved

Modified the /userinfo handler logic to resolve roles across all valid
project audience scopes provided in the token, not just the current
project.
Ensured that if **urn:zitadel:iam:org:projects:roles is in the scopes**,
roles from all declared project audiences are collected and included in
the response in **urn:zitadel:iam:org:projects:roles claim**.

# Additional Changes

# Additional Context

This change enables service-to-service authorization workflows and SPA
role resolution across multiple project contexts with a single token.
- Closes #9831

---------

Co-authored-by: Masum Patel <patelmasum98@gmail.com>
Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
2025-09-22 09:55:21 +00:00
Iraq
e5575d6042 chore(removing dead code): removing LabelPolicyAssetsRemovedEventType event is never made (#10742)
# Which Problems Are Solved

The event `LabelPolicyAssetsRemovedEventType` is never actually made
anywhere, this is dead code, this PR will remove it
2025-09-22 06:19:34 +00:00
Stefan Benz
b6ff4ff16c fix: generated project grant id (#10747)
# Which Problems Are Solved

Project Grant ID would have needed to be unique to be handled properly
on the projections, but was defined as the organization ID the project
was granted to, so could be non-unique.

# How the Problems Are Solved

Generate the Project Grant ID even in the v2 APIs, which also includes
fixes in the integration tests.
Additionally to that, the logic for some functionality had to be
extended as the Project Grant ID is not provided anymore in the API, so
had to be queried before creating events for Project Grants.

# Additional Changes

Included fix for authorizations, when an authorization was intended to
be created for a project, without providing any organization
information, which also showed some faulty integration tests.

# Additional Context

Partially closes #10745

---------

Co-authored-by: Livio Spring <livio.a@gmail.com>
Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
2025-09-21 14:04:25 +00:00
Livio Spring
2c0ee0008f fix(api): sorting on list users endpoints (#10750)
# Which Problems Are Solved

#10415 added the possibility to filter users based on metadata. To
prevent duplicate results an sql `DISTINCT` was added. This resulted in
issues if the list was sorted on string columns like `username` or
`displayname`, since they are sorted using `lower`. Using `DISTINCT`
requires the `order by` column to be part of the `SELECT` statement.

# How the Problems Are Solved

Added the order by column to the statement.

# Additional Changes

None

# Additional Context

- relates to #10415 
- backport to v4.x

---------

Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
2025-09-18 10:17:23 +00:00
Livio Spring
57e8033b6e fix: use hash to compare user metadata value (#10749)
# Which Problems Are Solved

Depending on the metadata values (already existing), the newly created
index (#10415) cannot be created or error in the future.

# How the Problems Are Solved

- Create the index using `sha256` and change the query to use sha256 as
well when comparing bytes values such as user_metadata.
- Added a setup step to cleanup potentially created index on
`projections.user_metadata5`

# Additional Changes

None

# Additional Context

- relates to #10415 
- requires backport to v4.x
2025-09-18 09:50:56 +00:00
Silvan
22ef817d5c fix(eventstore): Make Eventstore Compatible with Relational Table Package (#10687)
Improves compatibility of eventstore and related database components
with the new relational table package.

## Which problems are solved

1. **Incompatible Database Interfaces**: The existing eventstore was
tightly coupled to the database package, which is incompatible with the
new, more abstract relational table package in v3. This prevented the
new command-side logic from pushing events to the legacy eventstore.
2. **Missing Health Checks**: The database interfaces in the new package
lacked a Ping method, making it impossible to perform health checks on
database connections.
3. **Event Publishing Logic**: The command handling logic in domain
needed a way to collect and push events to the legacy eventstore after a
command was successfully executed.

## How the problems are solved

1. **`LegacyEventstore` Interface**:
* A new `LegacyEventstore` interface is introduced in the new
`database/eventstore` . This interface exposes a `PushWithNewClient`
method that accepts the new `database.QueryExecutor` interface,
decoupling the v3 domain from the legacy implementation.
* The `internal/eventstore.Eventstore` now implements this interface. A
wrapper, PushWithClient, is added to convert the old database client
types (`*sql.DB`, `*sql.Tx`) into the new `QueryExecutor` types before
calling `PushWithNewClient`.
2. **Database Interface Updates**:
* The `database.Pool` and `database.Client` interfaces in
`storage/eventstore` have been updated to include a Ping method,
allowing for consistent health checks across different database
dialects.
* The `postgres` and `sql` dialect implementations have been updated to
support this new method.
3. **Command and Invoker Refactoring**:
* The `Commander` interface in domain now includes an `Events()
[]legacy_es.Command` method. This allows commands to declare which
events they will generate.
* The `eventCollector` in the invoker logic has been redesigned. It now
ensures a database transaction is started before executing a command.
After successful execution, it calls the `Events()` method on the
command to collect the generated events and appends them to a list.
* The `eventStoreInvoker` then pushes all collected events to the legacy
eventstore using the new `LegacyEventstore` interface, ensuring that
events are only pushed if the entire command (and any sub-commands)
executes successfully within the transaction.
4. **Testing**:
* New unit tests have been added for the invoker to verify that events
are correctly collected from single commands, batched commands, and
nested commands.

These changes create a clean bridge between the new v3 command-side
logic and the existing v1 eventstore, allowing for incremental adoption
of the new architecture while maintaining full functionality.

## Additional Information

closes https://github.com/zitadel/zitadel/issues/10442
2025-09-16 18:58:49 +02:00
Livio Spring
5329d50509 fix: correct user self management on metadata and delete (#10666)
# Which Problems Are Solved

This PR fixes the self-management of users for metadata and own removal
and improves the corresponding permission checks.
While looking into the problems, I also noticed that there's a bug in
the metadata mapping when using `api.metadata.push` in actions v1 and
that re-adding a previously existing key after its removal was not
possible.

# How the Problems Are Solved

- Added a parameter `allowSelfManagement` to checkPermissionOnUser to
not require a permission if a user is changing its own data.
- Updated use of `NewPermissionCheckUserWrite` including prevention of
self-management for metadata.
- Pass permission check to the command side (for metadata functions) to
allow it implicitly for login v1 and actions v1.
- Use of json.Marshal for the metadata mapping (as with
`AppendMetadata`)
- Check the metadata state when comparing the value.

# Additional Changes

- added a variadic `roles` parameter to the `CreateOrgMembership`
integration test helper function to allow defining specific roles.

# Additional Context

- noted internally while testing v4.1.x
- requires backport to v4.x
- closes https://github.com/zitadel/zitadel/issues/10470
- relates to https://github.com/zitadel/zitadel/pull/10426
2025-09-16 12:26:21 +00:00
Stefan Benz
edb227f066 fix: user grant query with user organization instead of organization … (#10732)
…of project grant

# Which Problems Are Solved

On Management API the fields for `GrantedOrgId`, `GrantedOrgName` and
`GrantedOrgDomain` were only filled if it was a usergrant for a granted
project.

# How the Problems Are Solved

Correctly query the Organization of the User again to the Organization
the Project is granted to.
Then fill in the information about the Organization of the User in the
fields `GrantedOrgId`, `GrantedOrgName` and `GrantedOrgDomain`.

# Additional Changes

Additionally query the information about the Organization the Project is
granted to, to have it available for the Authorization v2beta API.

# Additional Context

Closes #10723

---------

Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
2025-09-16 10:04:53 +00:00
Stefan Benz
b0642a5898 chore: correct org integration tests (#10708)
# Which Problems Are Solved

Eventual consistency issues.

# How the Problems Are Solved

Correctly handle timeouts and change queries to domains instead of using
the organization name.

# Additional Changes

None

# Additional Context

None

Co-authored-by: Livio Spring <livio.a@gmail.com>
2025-09-16 07:26:19 +00:00
Livio Spring
bc471b4f78 fix(service ping): log body size of reports (#10686)
# Which Problems Are Solved

The current service ping reports can run into body size limit errors and
there's no way of knowing how big the current size is.

# How the Problems Are Solved

Log the current size to have at least some insights and possibly change
bulk size.

# Additional Changes

None

# Additional Context

- noticed internally
- backport to v4.x
2025-09-16 07:04:17 +00:00
Livio Spring
ee92560f32 fix(projections): handle reduce error by updating failed events (#10726)
# Which Problems Are Solved

I noticed that a failure in the projections handlers `reduce` function
(e.g. creating the statement or checking preconditions for the
statement) would not update the `failed_events2` table.
This was due to a wrong error handling, where as long as the
`maxFailureCount` was not reached, the error was returned after updating
the `failed_events2` table, which causes the transaction to be rolled
back and thus losing the update.

# How the Problems Are Solved

Wrap the error into an `executionError`, so the transaction is not
rolled back.

# Additional Changes

none

# Additional Context

- noticed internally
- requires backport to v3.x and v4.x
2025-09-15 18:32:28 +02:00
Tim Möhlmann
f6f37d3a31 fix(cache): use key versioning (#10657)
# Which Problems Are Solved

Cached object may have a different schema between Zitadel versions.

# How the Problems Are Solved

Use the curent build version in DB based cache connectors PostgreSQL and
Redis.

# Additional Changes

- Cleanup the ZitadelVersion field from the authz Instance
- Solve potential race condition on global variables in build package.

# Additional Context

- Closes https://github.com/zitadel/zitadel/issues/10648
- Obsoletes https://github.com/zitadel/zitadel/pull/10646
- Needs to be back-ported to v4 over
https://github.com/zitadel/zitadel/pull/10645
2025-09-15 09:51:54 +00:00
Silvan
25ab6b2397 fix(projection): prevent skipped events written within the same microsecond (#10710)
This PR fixes a bug where projections could skip events if they were
written within the same microsecond, which can occur during high load on
different transactions.

## Problem

The event query ordering was not fully deterministic. Events created at
the exact same time (same `position`) and in the same transaction
(`in_tx_order`) were not guaranteed to be returned in the same order on
subsequent queries. This could lead to some events being skipped by the
projection logic.

## Solution

To solve this, the `ORDER BY` clause for event queries has been extended
to include `instance_id`, `aggregate_type`, and `aggregate_id`. This
ensures a stable and deterministic ordering for all events, even if they
share the same timestamp.

## Additionally changes:

* Replaced a manual slice search with the more idiomatic
`slices.Contains` to skip already projected instances.
* Changed the handling of already locked projections to log a debug
message and skip execution instead of returning an error.
* Ensures the database transaction is explicitly committed.
2025-09-12 14:26:03 +03:00
Livio Spring
25d921b20c fix: remove unnecessary details from import errors (#10703)
# Which Problems Are Solved

During the implementation of #10687, it was noticed that the import
endpoint might provide unnecessary error details.

# How the Problems Are Solved

Remove the underlying (parent) error from the error message.

# Additional Changes

none

# Additional Context

relates to #10687

Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
2025-09-12 07:50:57 +02:00
Livio Spring
e158f9447e fix(oidc): ignore invalid id_token_hints (#10682)
# Which Problems Are Solved

Invalid id_tokens used as `id_token_hint` on the authorization endpoints
currently return an error, resp. get display on the endpoint itself.

# How the Problems Are Solved

Ignore invalid id_token_hint errors and just log them.

# Additional Changes

None

# Additional Context

- closes https://github.com/zitadel/zitadel/issues/10673
- backport to v4.x
2025-09-10 06:25:25 +00:00
Stefan Benz
492f1826ee chore: move gofakeit integration testing calls (#10684)
# Which Problems Are Solved

Flakiness and conflicts in value from gofakeit.

# How the Problems Are Solved

Move Gofakeit calls to the integration package, to guarantee proper
usage and values for integration testing.

# Additional Changes

None

# Additional Context

None
2025-09-10 09:00:31 +03:00
Max Peintner
b9b9baf67f fix: Registration Form Legal Checkbox Logic (#10597)
Closes #10498

The registration form's legal checkboxes had incorrect validation logic
that prevented users from completing registration when only one legal
document (ToS or Privacy Policy) was configured, or when no legal
documents were required.

additionally removes a duplicate description for "or use Identity
Provider"

# Which Problems Are Solved

Having only partial legal documents was blocking users to register. The
logic now conditionally renders checkboxes and checks if all provided
documents are accepted.

# How the Problems Are Solved

- Fixed checkbox validation: Now properly validates based on which legal
documents are actually available
- acceptance logic: Only requires acceptance of checkboxes that are
shown
- No legal docs support: Users can proceed when no legal documents are
configured
- Proper state management: Fixed checkbox state tracking and mixed-up
test IDs

---------

Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
2025-09-09 07:37:32 +00:00
Livio Spring
a0c3ccecf7 fix: use a single translator for middleware (#10633)
# Which Problems Are Solved

Comparing the v3 and v4 deployments we noticed an increase in memory
usage. A first analysis revealed that it might be related to the
(multiple) initialization of the `i18n.Translator`, partially related

# How the Problems Are Solved

Initialize the tranlator once (apart from the translator interceptor,
which uses context / request specific information) and pass it to all
necessary middleware.

# Additional Changes

Removed unnecessary error return parameter from the translator
initialization.

# Additional Context

- noticed internally
- backport to v4.x
2025-09-09 06:34:59 +00:00
Livio Spring
79809d0199 fix(actions v2): send event_payload on event executions again (#10651)
# Which Problems Are Solved

It was noticed that on actions v2 when subscribing to events, the
webhook would always receive an empty `event_payload`:
```
{
    "aggregateID": "336494809936035843",
    "aggregateType": "user",
    "resourceOwner": "336392597046099971",
    "instanceID": "336392597046034435",
    "version": "v2",
    "sequence": 1,
    "event_type": "user.human.added",
    "created_at": "2025-09-05T08:55:36.156333Z",
    "userID": "336392597046755331",
    "event_payload":
    {}
}
```

The problem was due to using `json.Marshal` on the `Event` interface,
where the underlying `BaseEvent` prevents the data to be marshalled:

131f70db34/internal/eventstore/event_base.go (L38)

# How the Problems Are Solved

The `Event`s `Unmarshal` function is used with a `json.RawMessage`.

# Additional Changes

none

# Additional Context

- closes https://github.com/zitadel/zitadel/issues/10650
- requires backport to v4.x
2025-09-09 05:28:29 +00:00
Livio Spring
2dbe21fb30 feat(service ping): add additional resource counts (#10621)
# Which Problems Are Solved

Using the service ping, we want to have some additional insights to how
zitadel is configured. The current resource count report contains
already some amount of configured policies, such as the login_policy.
But we do not know if for example MFA is enforced.

# How the Problems Are Solved

- Added the following counts to the report:
  - service users per organization
  - MFA enforcements (though login policy)
  - Notification policies with password change option enabled
  - SCIM provisioned users (using user metadata)
- Since all of the above are conditional based on at least a column
inside a projection, a new `migration.CountTriggerConditional` has been
added, where a condition (column values) and an option to track updates
on that column should be considered for the count.
- For this to be possible, the following changes had to be made to the
existing sql resources:
- the `resource_name` has been added to unique constraint on the
`projection.resource_counts` table
- triggers have been added / changed to individually track `INSERT`,
`UPDATE`(s) and `DELETE` and be able to handle conditions
- an optional argument has been added to the
`projections.count_resource()` function to allow providing the
information to `UP` or `DOWN` count the resource on an update.

# Additional Changes

None

# Additional Context

- partially solves #10244 (reporting audit log retention limit will be
handled in #10245 directly)
- backport to v4.x
2025-09-08 16:30:03 +00:00
Stefan Benz
ef058c8de6 fix: org remove taken into account during username changes (#10617)
# Which Problems Are Solved

For instance settings related to username changes, already removed
organizations were still processed.

# How the Problems Are Solved

Do no process already removed organizations.

# Additional Changes

None

# Additional Context

None

---------

Co-authored-by: Marco A. <marco@zitadel.com>
2025-09-08 13:58:14 +00:00
Stefan Benz
75774eb64c chore: fix org v2beta integration tests (#10655)
# Which Problems Are Solved

Flakiness in integration tests for organization v2beta service.

# How the Problems Are Solved

Fix eventual consistent handling of integration tests.

# Additional Changes

None

# Additional Context

None

---------

Co-authored-by: Marco A. <marco@zitadel.com>
2025-09-08 11:43:16 +00:00
Stefan Benz
8909b9a2a6 feat: http provider signing key addition (#10641)
# Which Problems Are Solved

HTTP Request to HTTP providers for Email or SMS are not signed.

# How the Problems Are Solved

Add a Signing Key to the HTTP Provider resources, which is then used to
generate a header to sign the payload.

# Additional Changes

Additional tests for query side of the SMTP provider.

# Additional Context

Closes #10067

---------

Co-authored-by: Marco A. <marco@zitadel.com>
2025-09-08 11:00:04 +00:00
Gayathri Vijayan
51e12e224d feat(actionsv2): Propagate request headers in actions v2 (#10632)
# Which Problems Are Solved

This PR adds functionality to propagate request headers in actions v2. 

# How the Problems Are Solved
The new functionality is added to the`ExecutionHandler` interceptors,
where the incoming request headers (from a list of allowed headers to be
forwarded) are set in the payload of the request before calling the
target.

# Additional Changes
This PR also contains minor fixes to the Actions V2 example docs. 

# Additional Context
- Closes #9941

---------

Co-authored-by: Marco A. <marco@zitadel.com>
2025-09-08 08:50:52 +00:00
Tim Möhlmann
a06ae2c835 perf(cache): use redis unlink for key deletion (#10658)
# Which Problems Are Solved

The usage of the Redis `DEL` command showed blocking and slowdowns
during load-tests.

# How the Problems Are Solved

Use [`UNLINK`](https://redis.io/docs/latest/commands/UNLINK/) instead.

# Additional Changes

- none

# Additional Context

- closes https://github.com/zitadel/zitadel/issues/8930
2025-09-08 08:21:32 +00:00
Silvan
869282ca49 fix(repo): correct mapping for domains (#10653)
This pull request fixes an issue where the repository would fail to scan
organization or instance structs if the `domains` column was `NULL`.

## Which problems are solved

If the `domains` column of `orgs` or `instances` was `NULL`, the
repository failed scanning into the structs. This happened because the
scanning mechanism did not correctly handle `NULL` JSONB columns.

## How the problems are solved

A new generic type `JSONArray[T]` is introduced, which implements the
`sql.Scanner` interface. This type can correctly scan JSON arrays from
the database, including handling `NULL` values gracefully.

The repositories for instances and organizations have been updated to
use this new type for the domains field. The SQL queries have also been
improved to use `FILTER` with `jsonb_agg` for better readability and
performance when aggregating domains.

## Additional changes
* An unnecessary cleanup step in the organization domain tests for
already removed domains has been removed.
* The `pgxscan` library has been replaced with `sqlscan` for scanning
`database/sql`.Rows.
* Minor cleanups in integration tests.
2025-09-08 09:35:31 +02:00
Silvan
61cab8878e feat(backend): state persisted objects (#9870)
This PR initiates the rework of Zitadel's backend to state-persisted
objects. This change is a step towards a more scalable and maintainable
architecture.

## Changes

* **New `/backend/v3` package**: A new package structure has been
introduced to house the reworked backend logic. This includes:
* `domain`: Contains the core business logic, commands, and repository
interfaces.
* `storage`: Implements the repository interfaces for database
interactions with new transactional tables.
  * `telemetry`: Provides logging and tracing capabilities.
* **Transactional Tables**: New database tables have been defined for
`instances`, `instance_domains`, `organizations`, and `org_domains`.
* **Projections**: New projections have been created to populate the new
relational tables from the existing event store, ensuring data
consistency during the migration.
* **Repositories**: New repositories provide an abstraction layer for
accessing and manipulating the data in the new tables.
* **Setup**: A new setup step for `TransactionalTables` has been added
to manage the database migrations for the new tables.

This PR lays the foundation for future work to fully transition to
state-persisted objects for these components, which will improve
performance and simplify data access patterns.

This PR initiates the rework of ZITADEL's backend to state-persisted
objects. This is a foundational step towards a new architecture that
will improve performance and maintainability.

The following objects are migrated from event-sourced aggregates to
state-persisted objects:

* Instances
  * incl. Domains
* Orgs
  * incl. Domains

The structure of the new backend implementation follows the software
architecture defined in this [wiki
page](https://github.com/zitadel/zitadel/wiki/Software-Architecturel).

This PR includes:

* The initial implementation of the new transactional repositories for
the objects listed above.
* Projections to populate the new relational tables from the existing
event store.
* Adjustments to the build and test process to accommodate the new
backend structure.

This is a work in progress and further changes will be made to complete
the migration.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Iraq Jaber <iraq+github@zitadel.com>
Co-authored-by: Iraq <66622793+kkrime@users.noreply.github.com>
Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
2025-09-05 09:54:34 +01:00
Silvan
0575f67e94 fix(projections): overhaul the event projection system (#10560)
This PR overhauls our event projection system to make it more robust and
prevent skipped events under high load. The core change replaces our
custom, transaction-based locking with standard PostgreSQL advisory
locks. We also introduce a worker pool to manage concurrency and prevent
database connection exhaustion.

### Key Changes

* **Advisory Locks for Projections:** Replaces exclusive row locks and
inspection of `pg_stat_activity` with PostgreSQL advisory locks for
managing projection state. This is a more reliable and standard approach
to distributed locking.
* **Simplified Await Logic:** Removes the complex logic for awaiting
open transactions, simplifying it to a more straightforward time-based
filtering of events.
* **Projection Worker Pool:** Implements a worker pool to limit
concurrent projection triggers, preventing connection exhaustion and
improving stability under load. A new `MaxParallelTriggers`
configuration option is introduced.

### Problem Solved

Under high throughput, a race condition could cause projections to miss
events from the eventstore. This led to inconsistent data in projection
tables (e.g., a user grant might be missing). This PR fixes the
underlying locking and concurrency issues to ensure all events are
processed reliably.

### How it Works

1. **Event Writing:** When writing events, a *shared* advisory lock is
taken. This signals that a write is in progress.
2.  **Event Handling (Projections):**
* A projection worker attempts to acquire an *exclusive* advisory lock
for that specific projection. If the lock is already held, it means
another worker is on the job, so the current one backs off.
* Once the lock is acquired, the worker briefly acquires and releases
the same *shared* lock used by event writers. This acts as a barrier,
ensuring it waits for any in-flight writes to complete.
* Finally, it processes all events that occurred before its transaction
began.

### Additional Information

* ZITADEL no longer modifies the `application_name` PostgreSQL variable
during event writes.
*   The lock on the `current_states` table is now `FOR NO KEY UPDATE`.
*   Fixes https://github.com/zitadel/zitadel/issues/8509

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
2025-09-03 15:29:00 +00:00
Stefan Benz
bdefd9147f fix: permission check for actions v1 post creation user grants (#10638)
# Which Problems Are Solved

Unnecessary default permission check in creating an authorization fails
even if the functionality was called internally.

# How the Problems Are Solved

Move permission check to the proper implementation, so that necessary
permission checks are provided by the responsible API.

# Additional Changes

None

# Additional Context

Closes #10624

Co-authored-by: Livio Spring <livio.a@gmail.com>
2025-09-03 14:39:18 +00:00
Livio Spring
a1ad87387d fix: cleanup information in logs (#10634)
# Which Problems Are Solved

I noticed some outdated / misleading logs when starting zitadel:
- The `init-projections` were no longer in beta for a long time.
- The LRU auth request cache is disabled by default, which results in
the following message, which has caused confusion by customers:
```level=info msg="auth request cache disabled" error="must provide a positive size"```

# How the Problems Are Solved

- Removed the beta info
- Disable cache initialization if possible

# Additional Changes

None

# Additional Context

- noticed internally
- backport to v4.x
2025-09-03 09:18:54 +00:00
Marco A.
75a67be669 feat: Feature flag for relational tables (#10599)
# Which Problems Are Solved

This PR introduces a new feature flag `EnableRelationalTables` that will
be used in following implementations to decide whether Zitadel should
use the relational model or the event sourcing one.

# TODO

  - [x] Implement flag at system level
- [x] Display the flag on console:
https://github.com/zitadel/zitadel/pull/10615

# How the Problems Are Solved

  - Implement loading the flag from config
- Add persistence of the flag through gRPC endpoint
(SetInstanceFeatures)
- Implement reading of the flag through gRPC endpoint
(GetInstanceFeatures)

# Additional Changes

Some minor refactoring to remove un-needed generics annotations

# Additional Context

- Closes #10574

---------

Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
2025-09-02 09:48:46 +00:00
Marco A.
8df402fb4f feat: List users by metadata (#10415)
# Which Problems Are Solved

Some users have reported the need of retrieving users given a metadata
key, metadata value or both. This change introduces metadata search
filter on the `ListUsers()` endpoint to allow Zitadel users to search
for user records by metadata.

The changes affect only v2 APIs.

# How the Problems Are Solved

- Add new search filter to `ListUserRequest`: `MetaKey` and `MetaValue`
  - Add SQL indices on metadata key and metadata value
  - Update query to left join `user_metadata` table

# Additional Context

  - Closes #9053 
  - Depends on https://github.com/zitadel/zitadel/pull/10567

---------

Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
2025-09-01 16:12:36 +00:00
Zach Hirschtritt
fcdc598320 fix: correct river otel metrics units (#10425)
# Which Problems Are Solved

The
[otelriver](https://github.com/riverqueue/rivercontrib/tree/master/otelriver)
package uses default otel histogram buckets that are designed for
millisecond measurements. OTEL docs also suggest standardizing on using
seconds as the measurement unit. However, the default buckets from
opentelemetry-go are more or less useless when used with seconds as the
smallest measurement is 5 seconds and the largest is nearly 3 hours.
Example:
```
river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="ok",tag="[]",le="0"} 0
river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="ok",tag="[]",le="5"} 1144
river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="ok",tag="[]",le="10"} 1144
<...more buckets here...>
river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="ok",tag="[]",le="7500"} 1144
river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="ok",tag="[]",le="10000"} 1144
river_work_duration_histogram_seconds_bucket{attempt="1",kind="notification_request",otel_scope_name="github.com/riverqueue/rivercontrib/otelriver",otel_scope_version="",priority="1",queue="notification",status="ok",tag="[]",le="+Inf"} 1144
```

# How the Problems Are Solved

Change the default unit to "ms" from "s" as supported by the middleware
API:
https://riverqueue.com/docs/open-telemetry#list-of-middleware-options

# Additional Changes

None

# Additional Context

None

Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
2025-09-01 10:31:18 +00:00
Livio Spring
8574d6fbab chore(integration test): prevent eventual consistency issue in TestServer_Limits_AuditLogRetention (#10608)
# Which Problems Are Solved

The TestServer_Limits_AuditLogRetention is too reliant on time
constraints when checking that a limit is correctly applied. IN case it
takes to long to do all the preparation, there won't be any events to
read and the test will fail.

# How the Problems Are Solved

Don't require any events to be returned.

# Additional Changes

None

# Additional Context

- Noted a lot of pipeline to fail on this step.
- requires backport to at least v4.x
2025-09-01 12:22:09 +03:00
Tim Möhlmann
a9ebc06c77 perf(actionsv2): execution target router (#10564)
# Which Problems Are Solved

The event execution system currently uses a projection handler that
subscribes to and processes all events for all instances. This creates a
high static cost because the system over-fetches event data, handling
many events that are not needed by most instances. This inefficiency is
also reflected in high "rows returned" metrics in the database.

# How the Problems Are Solved

Eliminate the use of a project handler. Instead, events for which
"execution targets" are defined, are directly pushed to the queue by the
eventstore. A Router is populated in the Instance object in the authz
middleware.

- By joining the execution targets to the instance, no additional
queries are needed anymore.
- As part of the instance object, execution targets are now cached as
well.
- Events are queued within the same transaction, giving transactional
guarantees on delivery.
- Uses the "insert many fast` variant of River. Multiple jobs are queued
in a single round-trip to the database.
- Fix compatibility with PostgreSQL 15

# Additional Changes

- The signing key was stored as plain-text in the river job payload in
the DB. This violated our [Secrets
Storage](https://zitadel.com/docs/concepts/architecture/secrets#secrets-storage)
principle. This change removed the field and only uses the encrypted
version of the signing key.
- Fixed the target ordering from descending to ascending.
- Some minor linter warnings on the use of `io.WriteString()`.

# Additional Context

- Introduced in https://github.com/zitadel/zitadel/pull/9249
- Closes https://github.com/zitadel/zitadel/issues/10553
- Closes https://github.com/zitadel/zitadel/issues/9832
- Closes https://github.com/zitadel/zitadel/issues/10372
- Closes https://github.com/zitadel/zitadel/issues/10492

---------

Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
2025-09-01 07:21:10 +02:00
Stefan Benz
5721b63bcb chore: use integration package for name generation (#10591)
# Which Problems Are Solved

Integration test failed sometimes with `organization already
exists`-errors.

# How the Problems Are Solved

Use a consistent function to generate name used for organization
creation.

# Additional Changes

Correct a eventual consistent test for username around organization
domain changes with eventual consistent loop.

# Additional Context

None

---------

Co-authored-by: Livio Spring <livio.a@gmail.com>
2025-08-29 14:56:16 +02:00
Adam Kida
832e78f9bc feat(typescript): add i18n for input labels in Login V2 (#10233)
# Which Problems Are Solved

- Most inputs have hardcoded label

# How the Problems Are Solved

- add usage of i18n library for every label
- add labels to i18n translation files

# Additional Changes

- fixed key used in `device-code-form.tsx` by submit button
- `v2-default.json` was update and contains all values from login app
not only newly added key for labels.

# Additional Context

N.A

---------

Co-authored-by: David Skewis <david@zitadel.com>
Co-authored-by: Max Peintner <max@caos.ch>
2025-08-29 05:12:12 +00:00
JimmyKmi
6d0b7ed2aa chore(i18n): Completion Chinese translation (#10109)
# Which Problems Are Solved

- Inconsistencies in the terminology used for "身份认证提供商" (identity
provider) and "身份认证提供者" (identity supplier) in the Chinese translation
files could lead to confusion among users.
- Missing translations for terminology related to identity providers
could hinder user experience and understanding.

# How the Problems Are Solved

- Unified the terms "身份认证提供商" and "身份认证提供者" to consistently use
"身份认证提供者" across all Chinese translation files.
- Added necessary translations to ensure that all relevant terms related
to identity providers are accurately represented in the Chinese
localization.

# Additional Changes

- Improved overall readability and clarity in the Chinese translations
by ensuring consistent terminology for identity-related phrases
throughout the application.
- Complete the missing translations.

# Additional Context

If I have missed any translations, please point them out, and I would be
happy to complete them.

---------

Co-authored-by: Florian Forster <florian@zitadel.com>
2025-08-28 14:56:26 +00:00
Stefan Benz
8e60cce20d fix: correctly handle user grants on project grant to same organization (#10568)
# Which Problems Are Solved

Authorizations (aka user grants) could not be managed correctly if they
were created on a project grant, which itself was based on a project
granted to the own organization. The error persisted if the
corresponding (potentially unintended) project grant was removed again.

# How the Problems Are Solved

Fixed checks for managing user grants: Roles from projects and project
grants get handled individually to ensure cases like project grants on
the own organization.

# Additional Changes

Additional tests for the 3 failing scenarios.

# Additional Context

Closes #10556

---------

Co-authored-by: Livio Spring <livio.a@gmail.com>
2025-08-28 12:09:52 +00:00
Marco A.
b604615cab chore: move converter methods users v2 to separate converter package + add tests (#10567)
# Which Problems Are Solved

As requested by @adlerhurst in
https://github.com/zitadel/zitadel/pull/10415#discussion_r2298087711 , I
am moving the refactoring of v2 user converter methods to a separate PR

# How the Problems Are Solved

Cherry-pick 648c234caf

# Additional Context

Parent of https://github.com/zitadel/zitadel/pull/10415
2025-08-27 13:08:13 +02:00
Gayathri Vijayan
255d42da65 feat(saml): add SignatureMethod config for SAML IDP (#10520)
# Which Problems Are Solved
When a SAML IDP is created, the signing algorithm defaults to
`RSA-SHA1`.
This PR adds the functionality to configure the signing algorithm while
creating or updating a SAML IDP. When nothing is specified, `RSA-SHA1`
is the default.

Available options:
* RSA_SHA1
* RSA_SHA256
* RSA_SHA512


# How the Problems Are Solved

By introducing a new optional config to specify the Signing Algorithm. 

# Additional Changes
N/A

# Additional Context
- Closes #9842 

An existing bug in the UpdateSAMLProvider API will be fixed as a
followup in a different
[PR](https://github.com/zitadel/zitadel/pull/10557).

---------

Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
2025-08-27 09:07:13 +00:00
Stefan Benz
2a78fdfe1f fix: define base uri for login v2 feature as string to make it config… (#10533)
…urable

# Which Problems Are Solved

BaseURI defined in environment variables or configuration files was
ignored for Login v2 feature flag.

# How the Problems Are Solved

Define BaseURI as string so that the environment variables and
configuration files can be parsed into it.

# Additional Changes

None

# Additional Context

Closes #10405
2025-08-26 12:18:13 +00:00
Stefan Benz
0a14c01412 fix: configure default url templates (#10416)
# Which Problems Are Solved

Emails are still send only with URLs to login v1.

# How the Problems Are Solved

Add configuration for URLs as URL templates, so that links can point at
Login v2.

# Additional Changes

None

# Additional Context

Closes #10236

---------

Co-authored-by: Marco A. <marco@zitadel.com>
2025-08-26 10:14:41 +00:00
Iraq
2718d345b8 chore(docker-integration-postgres): adding volume to internal/integration/config/docker-compose.yaml (#10079)
# Which Problems Are Solved

This change makes it easier to delete the integration database

# How the Problems Are Solved

Gives the integration database a volume you can address via name

`docker volume rm config_zitadel_integration_db`
2025-08-26 07:55:30 +00:00
Livio Spring
9621d357c0 fix(service ping): improve systemID search query to use index (#10566)
# Which Problems Are Solved

We noticed that the startup for v4 was way slower than v3. A query
without an instanceID filter could be traced back to the systemID query
of the service ping.

# How the Problems Are Solved

A an empty instanceID to the query to ensure it used an appropriate
index.

# Additional Changes

None

# Additional Context

- Closes https://github.com/zitadel/zitadel/issues/10390
- backport to v4.x
2025-08-25 15:07:00 +00:00
Iraq
24a7d3ceb1 fix(project_roles): fixed bad permission check in command layer for project roles add/update/delete (#10531)
# Which Problems Are Solved

Project Admins would get permission errors when trying to add project
roles

# How the Problems Are Solved

Fixed wrong parameters were being passed into the permission check


- Closes https://github.com/zitadel/zitadel/issues/10505
2025-08-22 06:08:53 +00:00