Commit Graph

9 Commits

Author SHA1 Message Date
Silvan
19d1ab9c94 fix(projections): overhaul the event projection system (#10560)
This PR overhauls our event projection system to make it more robust and
prevent skipped events under high load. The core change replaces our
custom, transaction-based locking with standard PostgreSQL advisory
locks. We also introduce a worker pool to manage concurrency and prevent
database connection exhaustion.

### Key Changes

* **Advisory Locks for Projections:** Replaces exclusive row locks and
inspection of `pg_stat_activity` with PostgreSQL advisory locks for
managing projection state. This is a more reliable and standard approach
to distributed locking.
* **Simplified Await Logic:** Removes the complex logic for awaiting
open transactions, simplifying it to a more straightforward time-based
filtering of events.
* **Projection Worker Pool:** Implements a worker pool to limit
concurrent projection triggers, preventing connection exhaustion and
improving stability under load. A new `MaxParallelTriggers`
configuration option is introduced.

### Problem Solved

Under high throughput, a race condition could cause projections to miss
events from the eventstore. This led to inconsistent data in projection
tables (e.g., a user grant might be missing). This PR fixes the
underlying locking and concurrency issues to ensure all events are
processed reliably.

### How it Works

1. **Event Writing:** When writing events, a *shared* advisory lock is
taken. This signals that a write is in progress.
2.  **Event Handling (Projections):**
* A projection worker attempts to acquire an *exclusive* advisory lock
for that specific projection. If the lock is already held, it means
another worker is on the job, so the current one backs off.
* Once the lock is acquired, the worker briefly acquires and releases
the same *shared* lock used by event writers. This acts as a barrier,
ensuring it waits for any in-flight writes to complete.
* Finally, it processes all events that occurred before its transaction
began.

### Additional Information

* ZITADEL no longer modifies the `application_name` PostgreSQL variable
during event writes.
*   The lock on the `current_states` table is now `FOR NO KEY UPDATE`.
*   Fixes https://github.com/zitadel/zitadel/issues/8509

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
(cherry picked from commit 0575f67e94)
2025-09-15 09:41:49 +02:00
Silvan
7c5480f94e fix(eventstore): use decimal, correct mirror (#9916)
# Eventstore fixes

- `event.Position` used float64 before which can lead to [precision
loss](https://github.com/golang/go/issues/47300). The type got replaced
by [a type without precision
loss](https://github.com/jackc/pgx-shopspring-decimal)
- the handler reported the wrong error if the current state was updated
and therefore took longer to retry failed events.

# Mirror fixes

- max age of auth requests can be configured to speed up copying data
from `auth.auth_requests` table. Auth requests last updated before the
set age will be ignored. Default is 1 month
- notification projections are skipped because notifications should be
sent by the source system. The projections are set to the latest
position
- ensure that mirror can be executed multiple times
2025-05-27 17:13:17 +02:00
Tim Möhlmann
6bd706be98 fix(eventstore): revert precise decimal (#8527) (#8679)
(cherry picked from commit aeb379e7de)
2024-09-25 06:31:46 +02:00
Silvan
b522588d98 fix(eventstore): precise decimal (#8527)
# Which Problems Are Solved

Float64 which was used for the event.Position field is [not precise in
go and gets rounded](https://github.com/golang/go/issues/47300). This
can lead to unprecies position tracking of events and therefore
projections especially on cockcoachdb as the position used there is a
big number.

example of a unprecies position:
exact: 1725257931223002628
float64: 1725257931223002624.000000

# How the Problems Are Solved

The float64 was replaced by
[github.com/jackc/pgx-shopspring-decimal](https://github.com/jackc/pgx-shopspring-decimal).

# Additional Changes

Correct behaviour of makefile for load tests.
Rename `latestSequence`-queries to `latestPosition`
2024-09-06 12:19:19 +03:00
Livio Spring
e000fdd792 fix: handle context correctly in processEvents (#7320) 2024-01-31 11:25:28 +01:00
Silvan
5ce542b959 fix(handler): allow uint32 offset for migration scenarios (#7103) 2023-12-21 10:40:51 +00:00
Tim Möhlmann
f680dd934d refactor: rename package errors to zerrors (#7039)
* chore: rename package errors to zerrors

* rename package errors to gerrors

* fix error related linting issues

* fix zitadel error assertion

* fix gosimple linting issues

* fix deprecated linting issues

* resolve gci linting issues

* fix import structure

---------

Co-authored-by: Elio Bischof <elio@zitadel.com>
2023-12-08 15:30:55 +01:00
Silvan
e3d1ca4d58 fix(eventstore): improve pagination of handler filter (#6968)
* fix(setup): add filter_offset to `projections.current_states`

* fix(eventstore): allow offset in query

* fix(handler): offset for already processed events
2023-12-01 12:25:41 +00:00
Silvan
b5564572bc feat(eventstore): increase parallel write capabilities (#5940)
This implementation increases parallel write capabilities of the eventstore.
Please have a look at the technical advisories: [05](https://zitadel.com/docs/support/advisory/a10005) and  [06](https://zitadel.com/docs/support/advisory/a10006).
The implementation of eventstore.push is rewritten and stored events are migrated to a new table `eventstore.events2`.
If you are using cockroach: make sure that the database user of ZITADEL has `VIEWACTIVITY` grant. This is used to query events.
2023-10-19 12:19:10 +02:00