mirror of
https://github.com/zitadel/zitadel.git
synced 2025-11-02 03:38:46 +00:00
This PR overhauls our event projection system to make it more robust and
prevent skipped events under high load. The core change replaces our
custom, transaction-based locking with standard PostgreSQL advisory
locks. We also introduce a worker pool to manage concurrency and prevent
database connection exhaustion.
### Key Changes
* **Advisory Locks for Projections:** Replaces exclusive row locks and
inspection of `pg_stat_activity` with PostgreSQL advisory locks for
managing projection state. This is a more reliable and standard approach
to distributed locking.
* **Simplified Await Logic:** Removes the complex logic for awaiting
open transactions, simplifying it to a more straightforward time-based
filtering of events.
* **Projection Worker Pool:** Implements a worker pool to limit
concurrent projection triggers, preventing connection exhaustion and
improving stability under load. A new `MaxParallelTriggers`
configuration option is introduced.
### Problem Solved
Under high throughput, a race condition could cause projections to miss
events from the eventstore. This led to inconsistent data in projection
tables (e.g., a user grant might be missing). This PR fixes the
underlying locking and concurrency issues to ensure all events are
processed reliably.
### How it Works
1. **Event Writing:** When writing events, a *shared* advisory lock is
taken. This signals that a write is in progress.
2. **Event Handling (Projections):**
* A projection worker attempts to acquire an *exclusive* advisory lock
for that specific projection. If the lock is already held, it means
another worker is on the job, so the current one backs off.
* Once the lock is acquired, the worker briefly acquires and releases
the same *shared* lock used by event writers. This acts as a barrier,
ensuring it waits for any in-flight writes to complete.
* Finally, it processes all events that occurred before its transaction
began.
### Additional Information
* ZITADEL no longer modifies the `application_name` PostgreSQL variable
during event writes.
* The lock on the `current_states` table is now `FOR NO KEY UPDATE`.
* Fixes https://github.com/zitadel/zitadel/issues/8509
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
(cherry picked from commit 0575f67e94)
Load Tests
This package contains code for load testing specific endpoints of ZITADEL using k6.
Prerequisite
Structure
The use cases under tests are defined in src/use_cases. The implementation of ZITADEL resources and calls are located under src.
Execution
Env vars
VUS: Amount of parallel processes execute the test (default is 20)DURATION: Defines how long the tests are executed (default is200s)ZITADEL_HOST: URL of ZITADEL (default ishttp://localhost:8080)ADMIN_LOGIN_NAME: Loginanme of a human user withIAM_OWNER-roleADMIN_PASSWORD: password of the human user
To setup the tests we use the credentials of console and log in using an admin. The user must be able to create organizations and all resources inside organizations.
ADMIN_LOGIN_NAME:zitadel-admin@zitadel.localhostADMIN_PASSWORD:Password1!
Test
Before you run the tests you need an initialized user. The tests don't implement the change password screen during login.
make human_password_login
setup: creates human users
test: uses the previously created humans to sign in using the login uimake machine_pat_login
setup: creates machines and a pat for each machine
test: calls user info endpoint with the given patsmake machine_client_credentials_login
setup: creates machines and a client credential secret for each machine
test: calls token endpoint with theclient_credentialsgrant type.make user_info
setup: creates human users and signs them in
test: calls user info endpoint using the given humansmake manipulate_user
test: creates a human, updates its profile, locks the user and then deletes itmake introspect
setup: creates projects, one api per project, one key per api and generates the jwt from the given keys
test: calls introspection endpoint using the given JWTsmake add_session
setup: creates human users
test: creates new sessions with user id checkmake oidc_session
setup: creates a machine user to create the auth request and session.
test: creates an auth request, a session and links the session to the auth request. Implementation of this flow.make otp_session
setup: creates 1 human user for each VU and adds email OTP to it
test: creates a session based on the login name of the user, sets the email OTP challenge to the session and afterwards checks the OTP codemake password_session
setup: creates 1 human user for each VU and adds email OTP to it
test: creates a session based on the login name of the user and checks for the password on a second stepmake machine_jwt_profile_grant
setup: generates private/public key, creates machine users, adds a key
test: creates a token and calls user infomake machine_jwt_profile_grant_single_user
setup: generates private/public key, creates machine user, adds a key
test: creates a token and calls user info in parallel for the same usermake users_by_metadata_key
setup: creates for half of the VUS a human user and a machine for the other half, adds 3 metadata to each user test: calls the list users endpoint and filters by a metadata keymake users_by_metadata_value
setup: creates for half of the VUS a human user and a machine for the other half, adds 3 metadata to each user test: calls the list users endpoint and filters by a metadata valuemake verify_all_user_grants_exists
setup: creates 50 projects, 1 machine per VU test: creates a machine and grants all projects to the machine teardown: the organization is not removed to verify the data of the projections are correct. You can find additional information at the bottom of this file