11 Commits

Author SHA1 Message Date
Silvan
e36f402e09
fix(perf): simplify eventstore queries by removing or in projection handlers (#9530)
# Which Problems Are Solved

[A recent performance
enhancement]((https://github.com/zitadel/zitadel/pull/9497)) aimed at
optimizing event store queries, specifically those involving multiple
aggregate type filters, has successfully improved index utilization.
While the query planner now correctly selects relevant indexes, it
employs [bitmap index
scans](https://www.postgresql.org/docs/current/indexes-bitmap-scans.html)
to retrieve data.

This approach, while beneficial in many scenarios, introduces a
potential I/O bottleneck. The bitmap index scan first identifies the
required database blocks and then utilizes a bitmap to access the
corresponding rows from the table's heap. This subsequent "bitmap heap
scan" can result in significant I/O overhead, particularly when queries
return a substantial number of rows across numerous data pages.

## Impact:

Under heavy load or with queries filtering for a wide range of events
across multiple aggregate types, this increased I/O activity may lead
to:

- Increased query latency.
- Elevated disk utilization.
- Potential performance degradation of the event store and dependent
systems.

# How the Problems Are Solved

To address this I/O bottleneck and further optimize query performance,
the projection handler has been modified. Instead of employing multiple
OR clauses for each aggregate type, the aggregate and event type filters
are now combined using IN ARRAY filters.

Technical Details:

This change allows the PostgreSQL query planner to leverage [index-only
scans](https://www.postgresql.org/docs/current/indexes-index-only-scans.html).
By utilizing IN ARRAY filters, the database can efficiently retrieve the
necessary data directly from the index, eliminating the need to access
the table's heap. This results in:

* Reduced I/O: Index-only scans significantly minimize disk I/O
operations, as the database avoids reading data pages from the main
table.
* Improved Query Performance: By reducing I/O, query execution times are
substantially improved, leading to lower latency.

# Additional Changes

- rollback of https://github.com/zitadel/zitadel/pull/9497

# Additional Information

## Query Plan of previous query

```sql
SELECT 
    created_at, event_type, "sequence", "position", payload, creator, "owner", instance_id, aggregate_type, aggregate_id, revision 
FROM 
    eventstore.events2 
WHERE 
    instance_id = '<INSTANCE_ID>'
    AND (
        (
            instance_id = '<INSTANCE_ID>'
            AND "position" > <POSITION>
            AND aggregate_type = 'project'
            AND event_type = ANY(ARRAY[
                                'project.application.added'
                                ,'project.application.changed'
                                ,'project.application.deactivated'
                                ,'project.application.reactivated'
                                ,'project.application.removed'
                                ,'project.removed'
                                ,'project.application.config.api.added'
                                ,'project.application.config.api.changed'
                                ,'project.application.config.api.secret.changed'
                ,'project.application.config.api.secret.updated'
                                ,'project.application.config.oidc.added'
                                ,'project.application.config.oidc.changed'
                                ,'project.application.config.oidc.secret.changed'
                ,'project.application.config.oidc.secret.updated'
                                ,'project.application.config.saml.added'
                                ,'project.application.config.saml.changed'
                        ])
        ) OR (
            instance_id = '<INSTANCE_ID>'
            AND "position" > <POSITION>
            AND aggregate_type = 'org'
            AND event_type = 'org.removed'
        ) OR (
            instance_id = '<INSTANCE_ID>'
            AND "position" > <POSITION>
            AND aggregate_type = 'instance'
            AND event_type = 'instance.removed'
        )
    ) 
    AND "position" > 1741600905.3495
    AND "position" < (
        SELECT 
            COALESCE(EXTRACT(EPOCH FROM min(xact_start)), EXTRACT(EPOCH FROM now())) 
        FROM 
            pg_stat_activity
        WHERE 
            datname = current_database() 
            AND application_name = ANY(ARRAY['zitadel_es_pusher_', 'zitadel_es_pusher', 'zitadel_es_pusher_<INSTANCE_ID>']) 
            AND state <> 'idle'
    ) 
ORDER BY "position", in_tx_order LIMIT 200 OFFSET 1;
```

```
Limit  (cost=120.08..120.09 rows=7 width=361) (actual time=2.167..2.172 rows=0 loops=1)
   Output: events2.created_at, events2.event_type, events2.sequence, events2."position", events2.payload, events2.creator, events2.owner, events2.instance_id, events2.aggregate_type, events2.aggregate_id, events2.revision, events2.in_tx_order
   InitPlan 1
     ->  Aggregate  (cost=2.74..2.76 rows=1 width=32) (actual time=1.813..1.815 rows=1 loops=1)
           Output: COALESCE(EXTRACT(epoch FROM min(s.xact_start)), EXTRACT(epoch FROM now()))
           ->  Nested Loop  (cost=0.00..2.74 rows=1 width=8) (actual time=1.803..1.805 rows=0 loops=1)
                 Output: s.xact_start
                 Join Filter: (d.oid = s.datid)
                 ->  Seq Scan on pg_catalog.pg_database d  (cost=0.00..1.07 rows=1 width=4) (actual time=0.016..0.021 rows=1 loops=1)
                       Output: d.oid, d.datname, d.datdba, d.encoding, d.datlocprovider, d.datistemplate, d.datallowconn, d.dathasloginevt, d.datconnlimit, d.datfrozenxid, d.datminmxid, d.dattablespace, d.datcollate, d.datctype, d.datlocale, d.daticurules, d.datcollversion, d.datacl
                       Filter: (d.datname = current_database())
                       Rows Removed by Filter: 4
                 ->  Function Scan on pg_catalog.pg_stat_get_activity s  (cost=0.00..1.63 rows=3 width=16) (actual time=1.781..1.781 rows=0 loops=1)
                       Output: s.datid, s.pid, s.usesysid, s.application_name, s.state, s.query, s.wait_event_type, s.wait_event, s.xact_start, s.query_start, s.backend_start, s.state_change, s.client_addr, s.client_hostname, s.client_port, s.backend_xid, s.backend_xmin, s.backend_type, s.ssl, s.sslversion, s.sslcipher, s.sslbits, s.ssl_client_dn, s.ssl_client_serial, s.ssl_issuer_dn, s.gss_auth, s.gss_princ, s.gss_enc, s.gss_delegation, s.leader_pid, s.query_id
                       Function Call: pg_stat_get_activity(NULL::integer)
                       Filter: ((s.state <> 'idle'::text) AND (s.application_name = ANY ('{zitadel_es_pusher_,zitadel_es_pusher,zitadel_es_pusher_<INSTANCE_ID>}'::text[])))
                       Rows Removed by Filter: 49
   ->  Sort  (cost=117.31..117.33 rows=8 width=361) (actual time=2.167..2.168 rows=0 loops=1)
         Output: events2.created_at, events2.event_type, events2.sequence, events2."position", events2.payload, events2.creator, events2.owner, events2.instance_id, events2.aggregate_type, events2.aggregate_id, events2.revision, events2.in_tx_order
         Sort Key: events2."position", events2.in_tx_order
         Sort Method: quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on eventstore.events2  (cost=84.92..117.19 rows=8 width=361) (actual time=2.088..2.089 rows=0 loops=1)
               Output: events2.created_at, events2.event_type, events2.sequence, events2."position", events2.payload, events2.creator, events2.owner, events2.instance_id, events2.aggregate_type, events2.aggregate_id, events2.revision, events2.in_tx_order
               Recheck Cond: (((events2.instance_id = '<INSTANCE_ID>'::text) AND (events2.aggregate_type = 'project'::text) AND (events2.event_type = ANY ('{project.application.added,project.application.changed,project.application.deactivated,project.application.reactivated,project.application.removed,project.removed,project.application.config.api.added,project.application.config.api.changed,project.application.config.api.secret.changed,project.application.config.api.secret.updated,project.application.config.oidc.added,project.application.config.oidc.changed,project.application.config.oidc.secret.changed,project.application.config.oidc.secret.updated,project.application.config.saml.added,project.application.config.saml.changed}'::text[])) AND (events2."position" > <POSITION>) AND (events2."position" > 1741600905.3495) AND (events2."position" < (InitPlan 1).col1)) OR ((events2.instance_id = '<INSTANCE_ID>'::text) AND (events2.aggregate_type = 'org'::text) AND (events2.event_type = 'org.removed'::text) AND (events2."position" > <POSITION>) AND (events2."position" > 1741600905.3495) AND (events2."position" < (InitPlan 1).col1)) OR ((events2.instance_id = '<INSTANCE_ID>'::text) AND (events2.aggregate_type = 'instance'::text) AND (events2.event_type = 'instance.removed'::text) AND (events2."position" > <POSITION>) AND (events2."position" > 1741600905.3495) AND (events2."position" < (InitPlan 1).col1)))
               ->  BitmapOr  (cost=84.88..84.88 rows=8 width=0) (actual time=2.080..2.081 rows=0 loops=1)
                     ->  Bitmap Index Scan on es_projection  (cost=0.00..75.44 rows=8 width=0) (actual time=2.016..2.017 rows=0 loops=1)
                           Index Cond: ((events2.instance_id = '<INSTANCE_ID>'::text) AND (events2.aggregate_type = 'project'::text) AND (events2.event_type = ANY ('{project.application.added,project.application.changed,project.application.deactivated,project.application.reactivated,project.application.removed,project.removed,project.application.config.api.added,project.application.config.api.changed,project.application.config.api.secret.changed,project.application.config.api.secret.updated,project.application.config.oidc.added,project.application.config.oidc.changed,project.application.config.oidc.secret.changed,project.application.config.oidc.secret.updated,project.application.config.saml.added,project.application.config.saml.changed}'::text[])) AND (events2."position" > <POSITION>) AND (events2."position" > 1741600905.3495) AND (events2."position" < (InitPlan 1).col1))
                     ->  Bitmap Index Scan on es_projection  (cost=0.00..4.71 rows=1 width=0) (actual time=0.016..0.016 rows=0 loops=1)
                           Index Cond: ((events2.instance_id = '<INSTANCE_ID>'::text) AND (events2.aggregate_type = 'org'::text) AND (events2.event_type = 'org.removed'::text) AND (events2."position" > <POSITION>) AND (events2."position" > 1741600905.3495) AND (events2."position" < (InitPlan 1).col1))
                     ->  Bitmap Index Scan on es_projection  (cost=0.00..4.71 rows=1 width=0) (actual time=0.045..0.045 rows=0 loops=1)
                           Index Cond: ((events2.instance_id = '<INSTANCE_ID>'::text) AND (events2.aggregate_type = 'instance'::text) AND (events2.event_type = 'instance.removed'::text) AND (events2."position" > <POSITION>) AND (events2."position" > 1741600905.3495) AND (events2."position" < (InitPlan 1).col1))
 Query Identifier: 3194938266011254479
 Planning Time: 1.295 ms
 Execution Time: 2.832 ms
```

## Query Plan of new query

```sql
SELECT 
    created_at, event_type, "sequence", "position", payload, creator, "owner", instance_id, aggregate_type, aggregate_id, revision 
FROM 
    eventstore.events2 
WHERE 
    instance_id = '<INSTANCE_ID>'
    AND "position" > <POSITION>
    AND aggregate_type = ANY(ARRAY['project', 'instance', 'org'])
    AND event_type = ANY(ARRAY[
        'project.application.added'
        ,'project.application.changed'
        ,'project.application.deactivated'
        ,'project.application.reactivated'
        ,'project.application.removed'
        ,'project.removed'
        ,'project.application.config.api.added'
        ,'project.application.config.api.changed'
        ,'project.application.config.api.secret.changed'
        ,'project.application.config.api.secret.updated'
        ,'project.application.config.oidc.added'
        ,'project.application.config.oidc.changed'
        ,'project.application.config.oidc.secret.changed'
        ,'project.application.config.oidc.secret.updated'
        ,'project.application.config.saml.added'
        ,'project.application.config.saml.changed'
        ,'org.removed'
        ,'instance.removed'
    ])
    AND "position" < (
        SELECT 
            COALESCE(EXTRACT(EPOCH FROM min(xact_start)), EXTRACT(EPOCH FROM now())) 
        FROM 
            pg_stat_activity
        WHERE 
            datname = current_database() 
            AND application_name = ANY(ARRAY['zitadel_es_pusher_', 'zitadel_es_pusher', 'zitadel_es_pusher_<INSTANCE_ID>']) 
            AND state <> 'idle'
    ) 
ORDER BY "position", in_tx_order LIMIT 200 OFFSET 1;
```

```
Limit  (cost=293.34..293.36 rows=8 width=361) (actual time=4.686..4.689 rows=0 loops=1)
   Output: events2.created_at, events2.event_type, events2.sequence, events2."position", events2.payload, events2.creator, events2.owner, events2.instance_id, events2.aggregate_type, events2.aggregate_id, events2.revision, events2.in_tx_order
   InitPlan 1
     ->  Aggregate  (cost=2.74..2.76 rows=1 width=32) (actual time=1.717..1.719 rows=1 loops=1)
           Output: COALESCE(EXTRACT(epoch FROM min(s.xact_start)), EXTRACT(epoch FROM now()))
           ->  Nested Loop  (cost=0.00..2.74 rows=1 width=8) (actual time=1.658..1.659 rows=0 loops=1)
                 Output: s.xact_start
                 Join Filter: (d.oid = s.datid)
                 ->  Seq Scan on pg_catalog.pg_database d  (cost=0.00..1.07 rows=1 width=4) (actual time=0.026..0.028 rows=1 loops=1)
                       Output: d.oid, d.datname, d.datdba, d.encoding, d.datlocprovider, d.datistemplate, d.datallowconn, d.dathasloginevt, d.datconnlimit, d.datfrozenxid, d.datminmxid, d.dattablespace, d.datcollate, d.datctype, d.datlocale, d.daticurules, d.datcollversion, d.datacl
                       Filter: (d.datname = current_database())
                       Rows Removed by Filter: 4
                 ->  Function Scan on pg_catalog.pg_stat_get_activity s  (cost=0.00..1.63 rows=3 width=16) (actual time=1.628..1.628 rows=0 loops=1)
                       Output: s.datid, s.pid, s.usesysid, s.application_name, s.state, s.query, s.wait_event_type, s.wait_event, s.xact_start, s.query_start, s.backend_start, s.state_change, s.client_addr, s.client_hostname, s.client_port, s.backend_xid, s.backend_xmin, s.backend_type, s.ssl, s.sslversion, s.sslcipher, s.sslbits, s.ssl_client_dn, s.ssl_client_serial, s.ssl_issuer_dn, s.gss_auth, s.gss_princ, s.gss_enc, s.gss_delegation, s.leader_pid, s.query_id
                       Function Call: pg_stat_get_activity(NULL::integer)
                       Filter: ((s.state <> 'idle'::text) AND (s.application_name = ANY ('{zitadel_es_pusher_,zitadel_es_pusher,zitadel_es_pusher_<INSTANCE_ID>}'::text[])))
                       Rows Removed by Filter: 42
   ->  Sort  (cost=290.58..290.60 rows=9 width=361) (actual time=4.685..4.685 rows=0 loops=1)
         Output: events2.created_at, events2.event_type, events2.sequence, events2."position", events2.payload, events2.creator, events2.owner, events2.instance_id, events2.aggregate_type, events2.aggregate_id, events2.revision, events2.in_tx_order
         Sort Key: events2."position", events2.in_tx_order
         Sort Method: quicksort  Memory: 25kB
         ->  Index Scan using es_projection on eventstore.events2  (cost=0.70..290.43 rows=9 width=361) (actual time=4.616..4.617 rows=0 loops=1)
               Output: events2.created_at, events2.event_type, events2.sequence, events2."position", events2.payload, events2.creator, events2.owner, events2.instance_id, events2.aggregate_type, events2.aggregate_id, events2.revision, events2.in_tx_order
               Index Cond: ((events2.instance_id = '<INSTANCE_ID>'::text) AND (events2.aggregate_type = ANY ('{project,instance,org}'::text[])) AND (events2.event_type = ANY ('{project.application.added,project.application.changed,project.application.deactivated,project.application.reactivated,project.application.removed,project.removed,project.application.config.api.added,project.application.config.api.changed,project.application.config.api.secret.changed,project.application.config.api.secret.updated,project.application.config.oidc.added,project.application.config.oidc.changed,project.application.config.oidc.secret.changed,project.application.config.oidc.secret.updated,project.application.config.saml.added,project.application.config.saml.changed,org.removed,instance.removed}'::text[])) AND (events2."position" > <POSITION>) AND (events2."position" < (InitPlan 1).col1))
 Query Identifier: -8254550537132386499
 Planning Time: 2.864 ms
 Execution Time: 5.414 ms
 ```
2025-03-13 16:50:23 +01:00
Tim Möhlmann
aeb379e7de
fix(eventstore): revert precise decimal (#8527) (#8679) 2024-09-24 18:43:29 +02:00
Silvan
b522588d98
fix(eventstore): precise decimal (#8527)
# Which Problems Are Solved

Float64 which was used for the event.Position field is [not precise in
go and gets rounded](https://github.com/golang/go/issues/47300). This
can lead to unprecies position tracking of events and therefore
projections especially on cockcoachdb as the position used there is a
big number.

example of a unprecies position:
exact: 1725257931223002628
float64: 1725257931223002624.000000

# How the Problems Are Solved

The float64 was replaced by
[github.com/jackc/pgx-shopspring-decimal](https://github.com/jackc/pgx-shopspring-decimal).

# Additional Changes

Correct behaviour of makefile for load tests.
Rename `latestSequence`-queries to `latestPosition`
2024-09-06 12:19:19 +03:00
Tim Möhlmann
64a3bb3149
feat(v3alpha): web key resource (#8262)
# Which Problems Are Solved

Implement a new API service that allows management of OIDC signing web
keys.
This allows users to manage rotation of the instance level keys. which
are currently managed based on expiry.

The API accepts the generation of the following key types and
parameters:

- RSA keys with 2048, 3072 or 4096 bit in size and:
  - Signing with SHA-256 (RS256)
  - Signing with SHA-384 (RS384)
  - Signing with SHA-512 (RS512)
- ECDSA keys with
  - P256 curve
  - P384 curve
  - P512 curve
- ED25519 keys

# How the Problems Are Solved

Keys are serialized for storage using the JSON web key format from the
`jose` library. This is the format that will be used by OIDC for
signing, verification and publication.

Each instance can have a number of key pairs. All existing public keys
are meant to be used for token verification and publication the keys
endpoint. Keys can be activated and the active private key is meant to
sign new tokens. There is always exactly 1 active signing key:

1. When the first key for an instance is generated, it is automatically
activated.
2. Activation of the next key automatically deactivates the previously
active key.
3. Keys cannot be manually deactivated from the API
4. Active keys cannot be deleted

# Additional Changes

- Query methods that later will be used by the OIDC package are already
implemented. Preparation for #8031
- Fix indentation in french translation for instance event
- Move user_schema translations to consistent positions in all
translation files

# Additional Context

- Closes #8030
- Part of #7809

---------

Co-authored-by: Elio Bischof <elio@zitadel.com>
2024-08-14 14:18:14 +00:00
Tim Möhlmann
fe9bb49caa
chore(deps): update all go deps (#7773)
This change updates all go modules, including oidc, a major version of go-jose and the go 1.22 release.
2024-04-15 09:17:36 +00:00
Tim Möhlmann
f680dd934d
refactor: rename package errors to zerrors (#7039)
* chore: rename package errors to zerrors

* rename package errors to gerrors

* fix error related linting issues

* fix zitadel error assertion

* fix gosimple linting issues

* fix deprecated linting issues

* resolve gci linting issues

* fix import structure

---------

Co-authored-by: Elio Bischof <elio@zitadel.com>
2023-12-08 15:30:55 +01:00
adlerhurst
bd23a7a56f merge main into next 2023-10-19 12:34:00 +02:00
Silvan
b5564572bc
feat(eventstore): increase parallel write capabilities (#5940)
This implementation increases parallel write capabilities of the eventstore.
Please have a look at the technical advisories: [05](https://zitadel.com/docs/support/advisory/a10005) and  [06](https://zitadel.com/docs/support/advisory/a10006).
The implementation of eventstore.push is rewritten and stored events are migrated to a new table `eventstore.events2`.
If you are using cockroach: make sure that the database user of ZITADEL has `VIEWACTIVITY` grant. This is used to query events.
2023-10-19 12:19:10 +02:00
Elio Bischof
c12c2f09a4
fix: switch log level of failed locks to debug (#5746) 2023-04-25 19:20:59 +02:00
Stefan Benz
c2a5b785fb
feat: instance remove (#4345)
* feat(instance): add remove instance event with projections cleanup

* fix(instance): corrected used id to clean up projections

* fix merge

* fix: correct unit test projection names

* fix: current sequence of lists and query for ensuring keypair based projections

Co-authored-by: Livio Spring <livio.a@gmail.com>
Co-authored-by: Fabi <38692350+hifabienne@users.noreply.github.com>
2022-10-20 14:36:52 +02:00
Stefan Benz
7a5f7f82cf
feat(saml): implementation of saml for ZITADEL v2 (#3618) 2022-09-12 18:18:08 +02:00