Files
zitadel/load-test
Silvan 19d1ab9c94 fix(projections): overhaul the event projection system (#10560)
This PR overhauls our event projection system to make it more robust and
prevent skipped events under high load. The core change replaces our
custom, transaction-based locking with standard PostgreSQL advisory
locks. We also introduce a worker pool to manage concurrency and prevent
database connection exhaustion.

### Key Changes

* **Advisory Locks for Projections:** Replaces exclusive row locks and
inspection of `pg_stat_activity` with PostgreSQL advisory locks for
managing projection state. This is a more reliable and standard approach
to distributed locking.
* **Simplified Await Logic:** Removes the complex logic for awaiting
open transactions, simplifying it to a more straightforward time-based
filtering of events.
* **Projection Worker Pool:** Implements a worker pool to limit
concurrent projection triggers, preventing connection exhaustion and
improving stability under load. A new `MaxParallelTriggers`
configuration option is introduced.

### Problem Solved

Under high throughput, a race condition could cause projections to miss
events from the eventstore. This led to inconsistent data in projection
tables (e.g., a user grant might be missing). This PR fixes the
underlying locking and concurrency issues to ensure all events are
processed reliably.

### How it Works

1. **Event Writing:** When writing events, a *shared* advisory lock is
taken. This signals that a write is in progress.
2.  **Event Handling (Projections):**
* A projection worker attempts to acquire an *exclusive* advisory lock
for that specific projection. If the lock is already held, it means
another worker is on the job, so the current one backs off.
* Once the lock is acquired, the worker briefly acquires and releases
the same *shared* lock used by event writers. This acts as a barrier,
ensuring it waits for any in-flight writes to complete.
* Finally, it processes all events that occurred before its transaction
began.

### Additional Information

* ZITADEL no longer modifies the `application_name` PostgreSQL variable
during event writes.
*   The lock on the `current_states` table is now `FOR NO KEY UPDATE`.
*   Fixes https://github.com/zitadel/zitadel/issues/8509

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
(cherry picked from commit 0575f67e94)
2025-09-15 09:41:49 +02:00
..
2024-04-18 12:21:07 +03:00
2024-04-18 12:21:07 +03:00
2024-04-18 12:21:07 +03:00

Load Tests

This package contains code for load testing specific endpoints of ZITADEL using k6.

Prerequisite

Structure

The use cases under tests are defined in src/use_cases. The implementation of ZITADEL resources and calls are located under src.

Execution

Env vars

  • VUS: Amount of parallel processes execute the test (default is 20)
  • DURATION: Defines how long the tests are executed (default is 200s)
  • ZITADEL_HOST: URL of ZITADEL (default is http://localhost:8080)
  • ADMIN_LOGIN_NAME: Loginanme of a human user with IAM_OWNER-role
  • ADMIN_PASSWORD: password of the human user

To setup the tests we use the credentials of console and log in using an admin. The user must be able to create organizations and all resources inside organizations.

  • ADMIN_LOGIN_NAME: zitadel-admin@zitadel.localhost
  • ADMIN_PASSWORD: Password1!

Test

Before you run the tests you need an initialized user. The tests don't implement the change password screen during login.

  • make human_password_login
    setup: creates human users
    test: uses the previously created humans to sign in using the login ui
  • make machine_pat_login
    setup: creates machines and a pat for each machine
    test: calls user info endpoint with the given pats
  • make machine_client_credentials_login
    setup: creates machines and a client credential secret for each machine
    test: calls token endpoint with the client_credentials grant type.
  • make user_info
    setup: creates human users and signs them in
    test: calls user info endpoint using the given humans
  • make manipulate_user
    test: creates a human, updates its profile, locks the user and then deletes it
  • make introspect
    setup: creates projects, one api per project, one key per api and generates the jwt from the given keys
    test: calls introspection endpoint using the given JWTs
  • make add_session
    setup: creates human users
    test: creates new sessions with user id check
  • make oidc_session
    setup: creates a machine user to create the auth request and session.
    test: creates an auth request, a session and links the session to the auth request. Implementation of this flow.
  • make otp_session
    setup: creates 1 human user for each VU and adds email OTP to it
    test: creates a session based on the login name of the user, sets the email OTP challenge to the session and afterwards checks the OTP code
  • make password_session
    setup: creates 1 human user for each VU and adds email OTP to it
    test: creates a session based on the login name of the user and checks for the password on a second step
  • make machine_jwt_profile_grant
    setup: generates private/public key, creates machine users, adds a key
    test: creates a token and calls user info
  • make machine_jwt_profile_grant_single_user
    setup: generates private/public key, creates machine user, adds a key
    test: creates a token and calls user info in parallel for the same user
  • make users_by_metadata_key
    setup: creates for half of the VUS a human user and a machine for the other half, adds 3 metadata to each user test: calls the list users endpoint and filters by a metadata key
  • make users_by_metadata_value
    setup: creates for half of the VUS a human user and a machine for the other half, adds 3 metadata to each user test: calls the list users endpoint and filters by a metadata value
  • make verify_all_user_grants_exists
    setup: creates 50 projects, 1 machine per VU test: creates a machine and grants all projects to the machine teardown: the organization is not removed to verify the data of the projections are correct. You can find additional information at the bottom of this file