This PR overhauls our event projection system to make it more robust and
prevent skipped events under high load. The core change replaces our
custom, transaction-based locking with standard PostgreSQL advisory
locks. We also introduce a worker pool to manage concurrency and prevent
database connection exhaustion.
### Key Changes
* **Advisory Locks for Projections:** Replaces exclusive row locks and
inspection of `pg_stat_activity` with PostgreSQL advisory locks for
managing projection state. This is a more reliable and standard approach
to distributed locking.
* **Simplified Await Logic:** Removes the complex logic for awaiting
open transactions, simplifying it to a more straightforward time-based
filtering of events.
* **Projection Worker Pool:** Implements a worker pool to limit
concurrent projection triggers, preventing connection exhaustion and
improving stability under load. A new `MaxParallelTriggers`
configuration option is introduced.
### Problem Solved
Under high throughput, a race condition could cause projections to miss
events from the eventstore. This led to inconsistent data in projection
tables (e.g., a user grant might be missing). This PR fixes the
underlying locking and concurrency issues to ensure all events are
processed reliably.
### How it Works
1. **Event Writing:** When writing events, a *shared* advisory lock is
taken. This signals that a write is in progress.
2. **Event Handling (Projections):**
* A projection worker attempts to acquire an *exclusive* advisory lock
for that specific projection. If the lock is already held, it means
another worker is on the job, so the current one backs off.
* Once the lock is acquired, the worker briefly acquires and releases
the same *shared* lock used by event writers. This acts as a barrier,
ensuring it waits for any in-flight writes to complete.
* Finally, it processes all events that occurred before its transaction
began.
### Additional Information
* ZITADEL no longer modifies the `application_name` PostgreSQL variable
during event writes.
* The lock on the `current_states` table is now `FOR NO KEY UPDATE`.
* Fixes https://github.com/zitadel/zitadel/issues/8509
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
# Which Problems Are Solved
Some users have reported the need of retrieving users given a metadata
key, metadata value or both. This change introduces metadata search
filter on the `ListUsers()` endpoint to allow Zitadel users to search
for user records by metadata.
The changes affect only v2 APIs.
# How the Problems Are Solved
- Add new search filter to `ListUserRequest`: `MetaKey` and `MetaValue`
- Add SQL indices on metadata key and metadata value
- Update query to left join `user_metadata` table
# Additional Context
- Closes#9053
- Depends on https://github.com/zitadel/zitadel/pull/10567
---------
Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
Adjust status checks across various functions to accept any 2xx HTTP
response instead of only 200, improving the robustness of the API
response validation.
fixes#10436
# Which Problems Are Solved
The resource usage to query user(s) on the database was high and
therefore could have performance impact.
# How the Problems Are Solved
Database queries involving the users and loginnames table were improved
and an index was added for user by email query.
# Additional Changes
- spellchecks
- updated apis on load tests
# additional info
needs cherry pick to v3
# Which Problems Are Solved
No benchmarks for v2.70.0 were provided so far.
# How the Problems Are Solved
Benchmarks added
# Additional changes
- it's now possible to plot multiple charts, one chart per `metric_name`
# Which Problems Are Solved
Load-test requires single endpoint to be used for each test type.
# How the Problems Are Solved
Remove userinfo call from machine tests.
# Additional Changes
- Add load-test/.env to gitignore.
# Additional Context
- Related to #4424
# Which Problems Are Solved
Float64 which was used for the event.Position field is [not precise in
go and gets rounded](https://github.com/golang/go/issues/47300). This
can lead to unprecies position tracking of events and therefore
projections especially on cockcoachdb as the position used there is a
big number.
example of a unprecies position:
exact: 1725257931223002628
float64: 1725257931223002624.000000
# How the Problems Are Solved
The float64 was replaced by
[github.com/jackc/pgx-shopspring-decimal](https://github.com/jackc/pgx-shopspring-decimal).
# Additional Changes
Correct behaviour of makefile for load tests.
Rename `latestSequence`-queries to `latestPosition`
# Which Problems Are Solved
Currently there was no load test present for machine jwt profile grant.
This test is now added
# How the Problems Are Solved
K6 test implemented.
# Additional Context
- part of https://github.com/zitadel/zitadel/issues/8352
# Which Problems Are Solved
Extends load tests by testing session creation.
# How the Problems Are Solved
The test creates a session including a check for user id.
# Additional Context
- part of https://github.com/zitadel/zitadel/issues/7639