This PR overhauls our event projection system to make it more robust and
prevent skipped events under high load. The core change replaces our
custom, transaction-based locking with standard PostgreSQL advisory
locks. We also introduce a worker pool to manage concurrency and prevent
database connection exhaustion.
### Key Changes
* **Advisory Locks for Projections:** Replaces exclusive row locks and
inspection of `pg_stat_activity` with PostgreSQL advisory locks for
managing projection state. This is a more reliable and standard approach
to distributed locking.
* **Simplified Await Logic:** Removes the complex logic for awaiting
open transactions, simplifying it to a more straightforward time-based
filtering of events.
* **Projection Worker Pool:** Implements a worker pool to limit
concurrent projection triggers, preventing connection exhaustion and
improving stability under load. A new `MaxParallelTriggers`
configuration option is introduced.
### Problem Solved
Under high throughput, a race condition could cause projections to miss
events from the eventstore. This led to inconsistent data in projection
tables (e.g., a user grant might be missing). This PR fixes the
underlying locking and concurrency issues to ensure all events are
processed reliably.
### How it Works
1. **Event Writing:** When writing events, a *shared* advisory lock is
taken. This signals that a write is in progress.
2. **Event Handling (Projections):**
* A projection worker attempts to acquire an *exclusive* advisory lock
for that specific projection. If the lock is already held, it means
another worker is on the job, so the current one backs off.
* Once the lock is acquired, the worker briefly acquires and releases
the same *shared* lock used by event writers. This acts as a barrier,
ensuring it waits for any in-flight writes to complete.
* Finally, it processes all events that occurred before its transaction
began.
### Additional Information
* ZITADEL no longer modifies the `application_name` PostgreSQL variable
during event writes.
* The lock on the `current_states` table is now `FOR NO KEY UPDATE`.
* Fixes https://github.com/zitadel/zitadel/issues/8509
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
(cherry picked from commit 0575f67e94)
# Eventstore fixes
- `event.Position` used float64 before which can lead to [precision
loss](https://github.com/golang/go/issues/47300). The type got replaced
by [a type without precision
loss](https://github.com/jackc/pgx-shopspring-decimal)
- the handler reported the wrong error if the current state was updated
and therefore took longer to retry failed events.
# Mirror fixes
- max age of auth requests can be configured to speed up copying data
from `auth.auth_requests` table. Auth requests last updated before the
set age will be ignored. Default is 1 month
- notification projections are skipped because notifications should be
sent by the source system. The projections are set to the latest
position
- ensure that mirror can be executed multiple times
# Which Problems Are Solved
Float64 which was used for the event.Position field is [not precise in
go and gets rounded](https://github.com/golang/go/issues/47300). This
can lead to unprecies position tracking of events and therefore
projections especially on cockcoachdb as the position used there is a
big number.
example of a unprecies position:
exact: 1725257931223002628
float64: 1725257931223002624.000000
# How the Problems Are Solved
The float64 was replaced by
[github.com/jackc/pgx-shopspring-decimal](https://github.com/jackc/pgx-shopspring-decimal).
# Additional Changes
Correct behaviour of makefile for load tests.
Rename `latestSequence`-queries to `latestPosition`
* chore: use pgx v5
* chore: update go version
* remove direct pq dependency
* remove unnecessary type
* scan test
* map scanner
* converter
* uint8 number array
* duration
* most unit tests work
* unit tests work
* chore: coverage
* go 1.21
* linting
* int64 gopfertammi
* retry go 1.22
* retry go 1.22
* revert to go v1.21.5
* update go toolchain to 1.21.8
* go 1.21.8
* remove test flag
* go 1.21.5
* linting
* update toolchain
* use correct array
* use correct array
* add byte array
* correct value
* correct error message
* go 1.21 compatible
This implementation increases parallel write capabilities of the eventstore.
Please have a look at the technical advisories: [05](https://zitadel.com/docs/support/advisory/a10005) and [06](https://zitadel.com/docs/support/advisory/a10006).
The implementation of eventstore.push is rewritten and stored events are migrated to a new table `eventstore.events2`.
If you are using cockroach: make sure that the database user of ZITADEL has `VIEWACTIVITY` grant. This is used to query events.