# Which Problems Are Solved
For truly event-based notification handler, we need to be able to filter
out events of aggregates which are already handled. For example when an
event like `notify.success` or `notify.failed` was created on an
aggregate, we no longer require events from that aggregate ID.
# How the Problems Are Solved
Extend the query builder to use a `NOT IN` clause which excludes
aggregate IDs when they have certain events for a certain aggregate
type. For optimization and proper index usages, certain filters are
inherited from the parent query, such as:
- Instance ID
- Instance IDs
- Position offset
This is a prettified query as used by the unit tests:
```sql
SELECT created_at, event_type, "sequence", "position", payload, creator, "owner", instance_id, aggregate_type, aggregate_id, revision
FROM eventstore.events2
WHERE instance_id = $1
AND aggregate_type = $2
AND event_type = $3
AND "position" > $4
AND aggregate_id NOT IN (
SELECT aggregate_id
FROM eventstore.events2
WHERE aggregate_type = $5
AND event_type = ANY($6)
AND instance_id = $7
AND "position" > $8
)
ORDER BY "position" DESC, in_tx_order DESC
LIMIT $9
```
I used this query to run it against the `oidc_session` aggregate looking
for added events, excluding aggregates where a token was revoked,
against a recent position. It fully used index scans:
<details>
```json
[
{
"Plan": {
"Node Type": "Index Scan",
"Parallel Aware": false,
"Async Capable": false,
"Scan Direction": "Forward",
"Index Name": "es_projection",
"Relation Name": "events2",
"Alias": "events2",
"Actual Rows": 2,
"Actual Loops": 1,
"Index Cond": "((instance_id = '286399006995644420'::text) AND (aggregate_type = 'oidc_session'::text) AND (event_type = 'oidc_session.added'::text) AND (\"position\" > 1731582100.784168))",
"Rows Removed by Index Recheck": 0,
"Filter": "(NOT (hashed SubPlan 1))",
"Rows Removed by Filter": 1,
"Plans": [
{
"Node Type": "Index Scan",
"Parent Relationship": "SubPlan",
"Subplan Name": "SubPlan 1",
"Parallel Aware": false,
"Async Capable": false,
"Scan Direction": "Forward",
"Index Name": "es_projection",
"Relation Name": "events2",
"Alias": "events2_1",
"Actual Rows": 1,
"Actual Loops": 1,
"Index Cond": "((instance_id = '286399006995644420'::text) AND (aggregate_type = 'oidc_session'::text) AND (event_type = 'oidc_session.access_token.revoked'::text) AND (\"position\" > 1731582100.784168))",
"Rows Removed by Index Recheck": 0
}
]
},
"Triggers": [
]
}
]
```
</details>
# Additional Changes
- None
# Additional Context
- Related to https://github.com/zitadel/zitadel/issues/8931
---------
Co-authored-by: adlerhurst <silvan.reusser@gmail.com>
# Which Problems Are Solved
We need a reliable way to lock events that are being processed as part
of a job queue. For example in the notification handlers.
# How the Problems Are Solved
Allow setting `FOR UPDATE [ NOWAIT | SKIP LOCKED ]` to the eventstore
query builder using an open transaction.
- NOWAIT returns an errors if the lock cannot be obtained
- SKIP LOCKED only returns row which are not locked.
- Default is to wait for the lock to be released.
# Additional Changes
- none
# Additional Context
- [Locking
docs](https://www.postgresql.org/docs/17/sql-select.html#SQL-FOR-UPDATE-SHARE)
- Related to https://github.com/zitadel/zitadel/issues/8931
# Which Problems Are Solved
Optimize the query that checks for terminated sessions in the access
token verifier. The verifier is used in auth middleware, userinfo and
introspection.
# How the Problems Are Solved
The previous implementation built a query for certain events and then
appended a single `PositionAfter` clause. This caused the postgreSQL
planner to use indexes only for the instance ID, aggregate IDs,
aggregate types and event types. Followed by an expensive sequential
scan for the position. This resulting in internal over-fetching of rows
before the final filter was applied.
![Screenshot_20241007_105803](https://github.com/user-attachments/assets/f2d91976-be87-428b-b604-a211399b821c)
Furthermore, the query was searching for events which are not always
applicable. For example, there was always a session ID search and if
there was a user ID, we would also search for a browser fingerprint in
event payload (expensive). Even if those argument string would be empty.
This PR changes:
1. Nest the position query, so that a full `instance_id, aggregate_id,
aggregate_type, event_type, "position"` index can be matched.
2. Redefine the `es_wm` index to include the `position` column.
3. Only search for events for the IDs that actually have a value. Do not
search (noop) if none of session ID, user ID or fingerpint ID are set.
New query plan:
![Screenshot_20241007_110648](https://github.com/user-attachments/assets/c3234c33-1b76-4b33-a4a9-796f69f3d775)
# Additional Changes
- cleanup how we load multi-statement migrations and make that a bit
more reusable.
# Additional Context
- Related to https://github.com/zitadel/zitadel/issues/7639
# Which Problems Are Solved
Float64 which was used for the event.Position field is [not precise in
go and gets rounded](https://github.com/golang/go/issues/47300). This
can lead to unprecies position tracking of events and therefore
projections especially on cockcoachdb as the position used there is a
big number.
example of a unprecies position:
exact: 1725257931223002628
float64: 1725257931223002624.000000
# How the Problems Are Solved
The float64 was replaced by
[github.com/jackc/pgx-shopspring-decimal](https://github.com/jackc/pgx-shopspring-decimal).
# Additional Changes
Correct behaviour of makefile for load tests.
Rename `latestSequence`-queries to `latestPosition`
* chore: use pgx v5
* chore: update go version
* remove direct pq dependency
* remove unnecessary type
* scan test
* map scanner
* converter
* uint8 number array
* duration
* most unit tests work
* unit tests work
* chore: coverage
* go 1.21
* linting
* int64 gopfertammi
* retry go 1.22
* retry go 1.22
* revert to go v1.21.5
* update go toolchain to 1.21.8
* go 1.21.8
* remove test flag
* go 1.21.5
* linting
* update toolchain
* use correct array
* use correct array
* add byte array
* correct value
* correct error message
* go 1.21 compatible
Even though this is a feature it's released as fix so that we can back port to earlier revisions.
As reported by multiple users startup of ZITADEL after leaded to downtime and worst case rollbacks to the previously deployed version.
The problem starts rising when there are too many events to process after the start of ZITADEL. The root cause are changes on projections (database tables) which must be recomputed. This PR solves this problem by adding a new step to the setup phase which prefills the projections. The step can be enabled by adding the `--init-projections`-flag to `setup`, `start-from-init` and `start-from-setup`. Setting this flag results in potentially longer duration of the setup phase but reduces the risk of the problems mentioned in the paragraph above.
This implementation increases parallel write capabilities of the eventstore.
Please have a look at the technical advisories: [05](https://zitadel.com/docs/support/advisory/a10005) and [06](https://zitadel.com/docs/support/advisory/a10006).
The implementation of eventstore.push is rewritten and stored events are migrated to a new table `eventstore.events2`.
If you are using cockroach: make sure that the database user of ZITADEL has `VIEWACTIVITY` grant. This is used to query events.
* fix(user): add search query for login name
* fix(user): change login name query to IN from EXISTS
* fix(loginname): include InQuery into ListQuery with SubSelect as possible datasource
* fix(user): apply suggestions from code review
Co-authored-by: Livio Spring <livio.a@gmail.com>
* fix: correct unit test for search query
Co-authored-by: Fabi <38692350+hifabienne@users.noreply.github.com>
Co-authored-by: Livio Spring <livio.a@gmail.com>
## Note
This release requires a setup step to fully improve performance.
Be sure to start ZITADEL with an appropriate command (zitadel start-from-init / start-from-setup)
## Changes
- fix: only run projection scheduler on active instances
- fix: set default for concurrent instances of projections to 1 (for scheduling)
- fix: create more indexes on eventstore.events table
- fix: get current sequence for token check (improve reread performance)
* begin init checks for projections
* first projection checks
* debug notification providers with query fixes
* more projections and first index
* more projections
* more projections
* finish projections
* fix tests (remove db name)
* create tables in setup
* fix logging / error handling
* add tenant to views
* rename tenant to instance_id
* add instance_id to all projections
* add instance_id to all queries
* correct instance_id on projections
* add instance_id to failed_events
* use separate context for instance
* implement features projection
* implement features projection
* remove unique constraint from setup when migration failed
* add error to failed setup event
* add instance_id to primary keys
* fix IAM projection
* remove old migrations folder
* fix keysFromYAML test