# Which Problems Are Solved
Zitadel currently uses 3 database pool, 1 for queries, 1 for pushing
events and 1 for scheduled projection updates. This defeats the purpose
of a connection pool which already handles multiple connections.
During load tests we found that the current structure of connection
pools consumes a lot of database resources. The resource usage dropped
after we reduced the amount of database pools to 1 because existing
connections can be used more efficiently.
# How the Problems Are Solved
Removed logic to handle multiple connection pools and use a single one.
# Additional Changes
none
# Additional Context
part of https://github.com/zitadel/zitadel/issues/8352
# Which Problems Are Solved
In eventstore queries with aggregate ID exclusion filters, filters on
events creation date where not passed to the sub-query. This results in
a high amount of returned rows from the sub-query and high overall query
cost.
# How the Problems Are Solved
When CreatedAfter and CreatedBefore are used on the global search query,
copy those filters to the sub-query. We already did this for the
position column filter.
# Additional Changes
- none
# Additional Context
- Introduced in https://github.com/zitadel/zitadel/pull/8940
Co-authored-by: Livio Spring <livio.a@gmail.com>
# Which Problems Are Solved
If many events are written to the same aggregate id it can happen that
zitadel [starts to retry the push
transaction](48ffc902cc/internal/eventstore/eventstore.go (L101))
because [the locking
behaviour](48ffc902cc/internal/eventstore/v3/sequence.go (L25))
during push does compute the wrong sequence because newly committed
events are not visible to the transaction. These events impact the
current sequence.
In cases with high command traffic on a single aggregate id this can
have severe impact on general performance of zitadel. Because many
connections of the `eventstore pusher` database pool are blocked by each
other.
# How the Problems Are Solved
To improve the performance this locking mechanism was removed and the
business logic of push is moved to sql functions which reduce network
traffic and can be analyzed by the database before the actual push. For
clients of the eventstore framework nothing changed.
# Additional Changes
- after a connection is established prefetches the newly added database
types
- `eventstore.BaseEvent` now returns the correct revision of the event
# Additional Context
- part of https://github.com/zitadel/zitadel/issues/8931
---------
Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
Co-authored-by: Livio Spring <livio.a@gmail.com>
Co-authored-by: Max Peintner <max@caos.ch>
Co-authored-by: Elio Bischof <elio@zitadel.com>
Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
Co-authored-by: Miguel Cabrerizo <30386061+doncicuto@users.noreply.github.com>
Co-authored-by: Joakim Lodén <Loddan@users.noreply.github.com>
Co-authored-by: Yxnt <Yxnt@users.noreply.github.com>
Co-authored-by: Stefan Benz <stefan@caos.ch>
Co-authored-by: Harsha Reddy <harsha.reddy@klaviyo.com>
Co-authored-by: Zach H <zhirschtritt@gmail.com>
# Which Problems Are Solved
For truly event-based notification handler, we need to be able to filter
out events of aggregates which are already handled. For example when an
event like `notify.success` or `notify.failed` was created on an
aggregate, we no longer require events from that aggregate ID.
# How the Problems Are Solved
Extend the query builder to use a `NOT IN` clause which excludes
aggregate IDs when they have certain events for a certain aggregate
type. For optimization and proper index usages, certain filters are
inherited from the parent query, such as:
- Instance ID
- Instance IDs
- Position offset
This is a prettified query as used by the unit tests:
```sql
SELECT created_at, event_type, "sequence", "position", payload, creator, "owner", instance_id, aggregate_type, aggregate_id, revision
FROM eventstore.events2
WHERE instance_id = $1
AND aggregate_type = $2
AND event_type = $3
AND "position" > $4
AND aggregate_id NOT IN (
SELECT aggregate_id
FROM eventstore.events2
WHERE aggregate_type = $5
AND event_type = ANY($6)
AND instance_id = $7
AND "position" > $8
)
ORDER BY "position" DESC, in_tx_order DESC
LIMIT $9
```
I used this query to run it against the `oidc_session` aggregate looking
for added events, excluding aggregates where a token was revoked,
against a recent position. It fully used index scans:
<details>
```json
[
{
"Plan": {
"Node Type": "Index Scan",
"Parallel Aware": false,
"Async Capable": false,
"Scan Direction": "Forward",
"Index Name": "es_projection",
"Relation Name": "events2",
"Alias": "events2",
"Actual Rows": 2,
"Actual Loops": 1,
"Index Cond": "((instance_id = '286399006995644420'::text) AND (aggregate_type = 'oidc_session'::text) AND (event_type = 'oidc_session.added'::text) AND (\"position\" > 1731582100.784168))",
"Rows Removed by Index Recheck": 0,
"Filter": "(NOT (hashed SubPlan 1))",
"Rows Removed by Filter": 1,
"Plans": [
{
"Node Type": "Index Scan",
"Parent Relationship": "SubPlan",
"Subplan Name": "SubPlan 1",
"Parallel Aware": false,
"Async Capable": false,
"Scan Direction": "Forward",
"Index Name": "es_projection",
"Relation Name": "events2",
"Alias": "events2_1",
"Actual Rows": 1,
"Actual Loops": 1,
"Index Cond": "((instance_id = '286399006995644420'::text) AND (aggregate_type = 'oidc_session'::text) AND (event_type = 'oidc_session.access_token.revoked'::text) AND (\"position\" > 1731582100.784168))",
"Rows Removed by Index Recheck": 0
}
]
},
"Triggers": [
]
}
]
```
</details>
# Additional Changes
- None
# Additional Context
- Related to https://github.com/zitadel/zitadel/issues/8931
---------
Co-authored-by: adlerhurst <silvan.reusser@gmail.com>
# Which Problems Are Solved
We need a reliable way to lock events that are being processed as part
of a job queue. For example in the notification handlers.
# How the Problems Are Solved
Allow setting `FOR UPDATE [ NOWAIT | SKIP LOCKED ]` to the eventstore
query builder using an open transaction.
- NOWAIT returns an errors if the lock cannot be obtained
- SKIP LOCKED only returns row which are not locked.
- Default is to wait for the lock to be released.
# Additional Changes
- none
# Additional Context
- [Locking
docs](https://www.postgresql.org/docs/17/sql-select.html#SQL-FOR-UPDATE-SHARE)
- Related to https://github.com/zitadel/zitadel/issues/8931
# Which Problems Are Solved
Noisy neighbours can introduce projection latencies because the
projections only query events older than the start timestamp of the
oldest push transaction.
# How the Problems Are Solved
During push we set the application name to
`zitadel_es_pusher_<instance_id>` instead of `zitadel_es_pusher` which
is used to query events by projections.
ZITADEL CI/CD / compile (push) Blocked by required conditions
ZITADEL CI/CD / core-unit-test (push) Blocked by required conditions
ZITADEL CI/CD / core-integration-test (push) Blocked by required conditions
ZITADEL CI/CD / lint (push) Blocked by required conditions
ZITADEL CI/CD / container (push) Blocked by required conditions
ZITADEL CI/CD / e2e (push) Blocked by required conditions
ZITADEL CI/CD / release (push) Blocked by required conditions
Code Scanning / CodeQL-Build (go) (push) Waiting to run
Code Scanning / CodeQL-Build (javascript) (push) Waiting to run
# Which Problems Are Solved
Milestones used existing events from a number of aggregates. OIDC
session is one of them. We noticed in load-tests that the reduction of
the oidc_session.added event into the milestone projection is a costly
business with payload based conditionals. A milestone is reached once,
but even then we remain subscribed to the OIDC events. This requires the
projections.current_states to be updated continuously.
# How the Problems Are Solved
The milestone creation is refactored to use dedicated events instead.
The command side decides when a milestone is reached and creates the
reached event once for each milestone when required.
# Additional Changes
In order to prevent reached milestones being created twice, a migration
script is provided. When the old `projections.milestones` table exist,
the state is read from there and `v2` milestone aggregate events are
created, with the original reached and pushed dates.
# Additional Context
- Closes https://github.com/zitadel/zitadel/issues/8800
# Which Problems Are Solved
Float64 which was used for the event.Position field is [not precise in
go and gets rounded](https://github.com/golang/go/issues/47300). This
can lead to unprecies position tracking of events and therefore
projections especially on cockcoachdb as the position used there is a
big number.
example of a unprecies position:
exact: 1725257931223002628
float64: 1725257931223002624.000000
# How the Problems Are Solved
The float64 was replaced by
[github.com/jackc/pgx-shopspring-decimal](https://github.com/jackc/pgx-shopspring-decimal).
# Additional Changes
Correct behaviour of makefile for load tests.
Rename `latestSequence`-queries to `latestPosition`
# Which Problems Are Solved
This fix adds tracing spans to all V1 API import related functions. This
is to troubleshoot import related performance issues reported to us.
# How the Problems Are Solved
Add a tracing span to `api/grpc/admin/import.go` and all related
functions that are called in the `command` package.
# Additional Changes
- none
# Additional Context
- Reported by internal communication
# Which Problems Are Solved
Queriying events by an aggregate id can produce high loads on the
database if the aggregate id contains many events (count > 1000000).
# How the Problems Are Solved
Instead of using the postion and in_tx_order columns we use the sequence
column which guarantees correct ordering in a single aggregate and uses
more optimised indexes.
# Additional Context
Closes https://github.com/zitadel/DevOps/issues/50
Co-authored-by: Livio Spring <livio.a@gmail.com>
* chore: use pgx v5
* chore: update go version
* remove direct pq dependency
* remove unnecessary type
* scan test
* map scanner
* converter
* uint8 number array
* duration
* most unit tests work
* unit tests work
* chore: coverage
* go 1.21
* linting
* int64 gopfertammi
* retry go 1.22
* retry go 1.22
* revert to go v1.21.5
* update go toolchain to 1.21.8
* go 1.21.8
* remove test flag
* go 1.21.5
* linting
* update toolchain
* use correct array
* use correct array
* add byte array
* correct value
* correct error message
* go 1.21 compatible
* fix(db): add additional connection pool for projection spooling
* use correct connection pool for projections
---------
Co-authored-by: Livio Spring <livio.a@gmail.com>
* fix(eventstore): prevent allocation of filtered events
Directly reduce each event obtained from a sql.Rows scan,
so that we do not have to allocate all events in a slice.
* reinstate the mutex as RWMutex
* scan data directly
* add todos
* fix(writemodels): add reduce of parent
* test: remove comment
* update comments
---------
Co-authored-by: adlerhurst <silvan.reusser@gmail.com>
This implementation increases parallel write capabilities of the eventstore.
Please have a look at the technical advisories: [05](https://zitadel.com/docs/support/advisory/a10005) and [06](https://zitadel.com/docs/support/advisory/a10006).
The implementation of eventstore.push is rewritten and stored events are migrated to a new table `eventstore.events2`.
If you are using cockroach: make sure that the database user of ZITADEL has `VIEWACTIVITY` grant. This is used to query events.
feat(storage): read only transactions for queries (#6415)
* fix: tests
* bastle wie en grosse
* fix(database): scan as callback
* fix tests
* fix merge failures
* remove as of system time
* refactor: remove unused test
* refacotr: remove unused lines
* feat(eventstore): order by `creation_date` and `sequence`
* fix(logstore): use correct event type
---------
Co-authored-by: Livio Spring <livio.a@gmail.com>
## Note
This release requires a setup step to fully improve performance.
Be sure to start ZITADEL with an appropriate command (zitadel start-from-init / start-from-setup)
## Changes
- fix: only run projection scheduler on active instances
- fix: set default for concurrent instances of projections to 1 (for scheduling)
- fix: create more indexes on eventstore.events table
- fix: get current sequence for token check (improve reread performance)
* begin init checks for projections
* first projection checks
* debug notification providers with query fixes
* more projections and first index
* more projections
* more projections
* finish projections
* fix tests (remove db name)
* create tables in setup
* fix logging / error handling
* add tenant to views
* rename tenant to instance_id
* add instance_id to all projections
* add instance_id to all queries
* correct instance_id on projections
* add instance_id to failed_events
* use separate context for instance
* implement features projection
* implement features projection
* remove unique constraint from setup when migration failed
* add error to failed setup event
* add instance_id to primary keys
* fix IAM projection
* remove old migrations folder
* fix keysFromYAML test