Tim Möhlmann 77aa02a521
fix(projection): increase transaction duration (#8632)
# Which Problems Are Solved

Reduce the chance for projection dead-locks. Increasing or disabling the
projection transaction duration solved dead-locks in all reported cases.

# How the Problems Are Solved

Increase the default transaction duration to 1 minute.
Due to the high value it is functionally similar to disabling,
however it still provides a safety net for transaction that do freeze,
perhaps due to connection issues with the database.


# Additional Changes

- Integration test uses default.
- Technical advisory

# Additional Context

- Related to https://github.com/zitadel/zitadel/issues/8517

---------

Co-authored-by: Silvan <silvan.reusser@gmail.com>
2024-09-17 10:08:13 +00:00

2.6 KiB

title
Technical Advisory 10012

Date and Version

Version: 2.63.0

Date: 2024-09-26

Description

In version 2.63.0 we've increased the transaction duration for projections.

ZITADEL has an event driven architecture. After events are pushed to the eventstore, they are reduced into projections in bulk batches. Projections allow for efficient lookup of data through normalized SQL tables.

We've investigated multiple reports of outdated projections. For example created users missing in get requests, or missing data after a ZITADEL upgrade1. The conclusion is that the transaction in which we perform a bulk of queries can timeout. The old setting defined a transaction duration of 500ms for a bulk of 200 events. A single event may create multiple statements in a single projection. A timeout may occur even if the actual bulk size is less than 200, which then results in more back-pressure on a busy system, leading to more timeouts and effectively dead-locking a projection.

Increasing or disabling the projection transaction duration solved dead-locks in all reported cases. We've decided to increase the transaction duration to 1 minute. Due to the high value it is functionally similar to disabling, however it still provides a safety net for transaction that do freeze, perhaps due to connection issues with the database.

Statement

A summary of bug reports can be found in the following issue: Missing data due to outdated projections. This change was submitted in the following PR: fix(projection): increase transaction duration, which will be released in Version 2.63.0

Mitigation

If you have a custom configuration for projections, this update will not apply to your system or some projections. When encountering projection dead-lock consider increasing the timeout to the new default value.

Note that entries under Customizations overwrite the global settings for a single projection.

Projections:
  TransactionDuration: 1m # ZITADEL_PROJECTIONS_TRANSACTIONDURATION
  BulkLimit: 200 # ZITADEL_PROJECTIONS_BULKLIMIT
  Customizations:
    custom_texts:
      BulkLimit: 400
    project_grant_fields:
      TransactionDuration: 0s
      BulkLimit: 2000

Impact

Once this update has been released and deployed, transactions are allowed to run longer. No other functional impact is expected.


  1. Changes written to the eventstore are the main source of truth. When a projection is out of date, some request may serve incomplete or no data. The data itself is however not lost. ↩︎