Commit Graph

13 Commits

Author SHA1 Message Date
Silvan
0575f67e94 fix(projections): overhaul the event projection system (#10560)
This PR overhauls our event projection system to make it more robust and
prevent skipped events under high load. The core change replaces our
custom, transaction-based locking with standard PostgreSQL advisory
locks. We also introduce a worker pool to manage concurrency and prevent
database connection exhaustion.

### Key Changes

* **Advisory Locks for Projections:** Replaces exclusive row locks and
inspection of `pg_stat_activity` with PostgreSQL advisory locks for
managing projection state. This is a more reliable and standard approach
to distributed locking.
* **Simplified Await Logic:** Removes the complex logic for awaiting
open transactions, simplifying it to a more straightforward time-based
filtering of events.
* **Projection Worker Pool:** Implements a worker pool to limit
concurrent projection triggers, preventing connection exhaustion and
improving stability under load. A new `MaxParallelTriggers`
configuration option is introduced.

### Problem Solved

Under high throughput, a race condition could cause projections to miss
events from the eventstore. This led to inconsistent data in projection
tables (e.g., a user grant might be missing). This PR fixes the
underlying locking and concurrency issues to ensure all events are
processed reliably.

### How it Works

1. **Event Writing:** When writing events, a *shared* advisory lock is
taken. This signals that a write is in progress.
2.  **Event Handling (Projections):**
* A projection worker attempts to acquire an *exclusive* advisory lock
for that specific projection. If the lock is already held, it means
another worker is on the job, so the current one backs off.
* Once the lock is acquired, the worker briefly acquires and releases
the same *shared* lock used by event writers. This acts as a barrier,
ensuring it waits for any in-flight writes to complete.
* Finally, it processes all events that occurred before its transaction
began.

### Additional Information

* ZITADEL no longer modifies the `application_name` PostgreSQL variable
during event writes.
*   The lock on the `current_states` table is now `FOR NO KEY UPDATE`.
*   Fixes https://github.com/zitadel/zitadel/issues/8509

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
2025-09-03 15:29:00 +00:00
Marco A.
8df402fb4f feat: List users by metadata (#10415)
# Which Problems Are Solved

Some users have reported the need of retrieving users given a metadata
key, metadata value or both. This change introduces metadata search
filter on the `ListUsers()` endpoint to allow Zitadel users to search
for user records by metadata.

The changes affect only v2 APIs.

# How the Problems Are Solved

- Add new search filter to `ListUserRequest`: `MetaKey` and `MetaValue`
  - Add SQL indices on metadata key and metadata value
  - Update query to left join `user_metadata` table

# Additional Context

  - Closes #9053 
  - Depends on https://github.com/zitadel/zitadel/pull/10567

---------

Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
2025-09-01 16:12:36 +00:00
Silvan
f13529b31f fix(load-tests): accept any 2xx status as success (#10450)
Adjust status checks across various functions to accept any 2xx HTTP
response instead of only 200, improving the robustness of the API
response validation.

fixes #10436
2025-08-11 11:36:40 +03:00
Silvan
4df138286b perf(query): reduce user query duration (#10037)
# Which Problems Are Solved

The resource usage to query user(s) on the database was high and
therefore could have performance impact.

# How the Problems Are Solved

Database queries involving the users and loginnames table were improved
and an index was added for user by email query.

# Additional Changes

- spellchecks
- updated apis on load tests

# additional info

needs cherry pick to v3
2025-06-06 08:48:29 +00:00
Silvan
1ce68a562b docs(benchmarks): v2.70.0 (#9416)
# Which Problems Are Solved

No benchmarks for v2.70.0 were provided so far.

# How the Problems Are Solved

Benchmarks added

# Additional changes

- it's now possible to plot multiple charts, one chart per `metric_name`
2025-02-26 14:03:20 +00:00
Silvan
b10428fb56 test(session): load tests for session api (#9212)
# Which Problems Are Solved

We currently are not able to benchmark the performance of the session
api

# How the Problems Are Solved

Load tests were added to
- use sessions in oidc tokens analog
https://zitadel.com/docs/guides/integrate/login-ui/oidc-standard

# Additional Context

- Closes https://github.com/zitadel/zitadel/issues/7847
2025-01-29 12:08:20 +00:00
Tim Möhlmann
65e24b67da chore(load-test): disable userinfo after JWT profile (#8927)
# Which Problems Are Solved

Load-test requires single endpoint to be used for each test type.

# How the Problems Are Solved

Remove userinfo call from machine tests.

# Additional Changes

- Add load-test/.env to gitignore.

# Additional Context

- Related to #4424
2024-11-19 09:53:07 +01:00
Tim Möhlmann
aeb379e7de fix(eventstore): revert precise decimal (#8527) (#8679) 2024-09-24 18:43:29 +02:00
Silvan
15c9f71bee test(load): add machine jwt profile test for a single user (#8593)
# How the Problems Are Solved

Adds a new load test to use the token endpoint with a single user by
multiple threads.
2024-09-11 09:23:24 +00:00
Silvan
b522588d98 fix(eventstore): precise decimal (#8527)
# Which Problems Are Solved

Float64 which was used for the event.Position field is [not precise in
go and gets rounded](https://github.com/golang/go/issues/47300). This
can lead to unprecies position tracking of events and therefore
projections especially on cockcoachdb as the position used there is a
big number.

example of a unprecies position:
exact: 1725257931223002628
float64: 1725257931223002624.000000

# How the Problems Are Solved

The float64 was replaced by
[github.com/jackc/pgx-shopspring-decimal](https://github.com/jackc/pgx-shopspring-decimal).

# Additional Changes

Correct behaviour of makefile for load tests.
Rename `latestSequence`-queries to `latestPosition`
2024-09-06 12:19:19 +03:00
Silvan
1ce9a4322e test(load): machine jwt profile grant (#8482)
# Which Problems Are Solved

Currently there was no load test present for machine jwt profile grant.
This test is now added

# How the Problems Are Solved

K6 test implemented.

# Additional Context

- part of https://github.com/zitadel/zitadel/issues/8352
2024-08-27 13:06:03 +00:00
Silvan
82d950019f test: add load test for session creation (#8088)
# Which Problems Are Solved

Extends load tests by testing session creation.

# How the Problems Are Solved

The test creates a session including a check for user id.

# Additional Context

- part of https://github.com/zitadel/zitadel/issues/7639
2024-07-09 15:16:50 +00:00
Silvan
d337668599 chore: init load tests (#7635)
* init load tests

* add machine pat

* setup app

* add introspect

* use xk6-modules repo

* logging

* add teardown

* add manipulate user

* add manipulate user

* remove logs

* convert tests to ts

* add readme

* zitadel

* review comments
2024-04-18 12:21:07 +03:00