fix(projection): increase transaction duration (#8632)

# Which Problems Are Solved

Reduce the chance for projection dead-locks. Increasing or disabling the
projection transaction duration solved dead-locks in all reported cases.

# How the Problems Are Solved

Increase the default transaction duration to 1 minute.
Due to the high value it is functionally similar to disabling,
however it still provides a safety net for transaction that do freeze,
perhaps due to connection issues with the database.


# Additional Changes

- Integration test uses default.
- Technical advisory

# Additional Context

- Related to https://github.com/zitadel/zitadel/issues/8517

---------

Co-authored-by: Silvan <silvan.reusser@gmail.com>
This commit is contained in:
Tim Möhlmann 2024-09-17 13:08:13 +03:00 committed by GitHub
parent 5fdad7b8f4
commit 77aa02a521
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 61 additions and 5 deletions

View File

@ -231,7 +231,7 @@ Projections:
# The maximum duration a transaction remains open
# before it spots left folding additional events
# and updates the table.
TransactionDuration: 500ms # ZITADEL_PROJECTIONS_TRANSACTIONDURATION
TransactionDuration: 1m # ZITADEL_PROJECTIONS_TRANSACTIONDURATION
# Time interval between scheduled projections
RequeueEvery: 60s # ZITADEL_PROJECTIONS_REQUEUEEVERY
# Time between retried database statements resulting from projected events
@ -246,10 +246,7 @@ Projections:
HandleActiveInstances: 0s # ZITADEL_PROJECTIONS_HANDLEACTIVEINSTANCES
# In the Customizations section, all settings from above can be overwritten for each specific projection
Customizations:
Projects:
TransactionDuration: 2s
custom_texts:
TransactionDuration: 2s
BulkLimit: 400
project_grant_fields:
TransactionDuration: 0s

View File

@ -0,0 +1,60 @@
---
title: Technical Advisory 10012
---
## Date and Version
Version: 2.63.0
Date: 2024-09-26
## Description
In version 2.63.0 we've increased the transaction duration for projections.
ZITADEL has an event driven architecture. After events are pushed to the eventstore,
they are reduced into projections in bulk batches. Projections allow for efficient lookup of data through normalized SQL tables.
We've investigated multiple reports of outdated projections.
For example created users missing in get requests, or missing data after a ZITADEL upgrade[^1].
The conclusion is that the transaction in which we perform a bulk of queries can timeout.
The old setting defined a transaction duration of 500ms for a bulk of 200 events.
A single event may create multiple statements in a single projection.
A timeout may occur even if the actual bulk size is less than 200,
which then results in more back-pressure on a busy system, leading to more timeouts and effectively dead-locking a projection.
Increasing or disabling the projection transaction duration solved dead-locks in all reported cases.
We've decided to increase the transaction duration to 1 minute.
Due to the high value it is functionally similar to disabling,
however it still provides a safety net for transaction that do freeze,
perhaps due to connection issues with the database.
[^1]: Changes written to the eventstore are the main source of truth. When a projection is out of date, some request may serve incomplete or no data. The data itself is however not lost.
## Statement
A summary of bug reports can be found in the following issue: [Missing data due to outdated projections](https://github.com/zitadel/zitadel/issues/8517).
This change was submitted in the following PR:
[fix(projection): increase transaction duration](https://github.com/zitadel/zitadel/pull/8632), which will be released in Version [2.63.0](https://github.com/zitadel/zitadel/releases/tag/v2.63.0)
## Mitigation
If you have a custom configuration for projections, this update will not apply to your system or some projections. When encountering projection dead-lock consider increasing the timeout to the new default value.
Note that entries under `Customizations` overwrite the global settings for a single projection.
```yaml
Projections:
TransactionDuration: 1m # ZITADEL_PROJECTIONS_TRANSACTIONDURATION
BulkLimit: 200 # ZITADEL_PROJECTIONS_BULKLIMIT
Customizations:
custom_texts:
BulkLimit: 400
project_grant_fields:
TransactionDuration: 0s
BulkLimit: 2000
```
## Impact
Once this update has been released and deployed, transactions are allowed to run longer. No other functional impact is expected.

View File

@ -33,7 +33,6 @@ LogStore:
Projections:
HandleActiveInstances: 30m
RequeueEvery: 5s
TransactionDuration: 1m
Customizations:
NotificationsQuotas:
RequeueEvery: 1s