# Which Problems Are Solved
When running a long-running Zitadel Setup, Kubernetes might decide to
move a pod to a new node automatically. Currently, this puts any
migrations into a broken state that an operator needs to manually run
the "cleanup" command on - assuming they catch the error.
The only super long running commands are typically projection pre-fill
operations, which depending on the size of the event table for that
projection, can take many hours - plenty of time for Kubernetes to make
unexpected decisions, especially in a busy cluster.
# How the Problems Are Solved
This change listens on `os.Interrupt` and `syscall.SIGTERM`, cancels the
current Setup context, and runs the `Cleanup` command. The logs then
look something like this:
```shell
...
INFO[0000] verify migration caller="/Users/zach/src/zitadel/internal/migration/migration.go:43" name=repeatable_delete_stale_org_fields
INFO[0000] starting migration caller="/Users/zach/src/zitadel/internal/migration/migration.go:66" name=repeatable_delete_stale_org_fields
INFO[0000] execute delete query caller="/Users/zach/src/zitadel/cmd/setup/39.go:37" instance_id=281297936179003398 migration=repeatable_delete_stale_org_fields progress=1/1
INFO[0000] verify migration caller="/Users/zach/src/zitadel/internal/migration/migration.go:43" name=repeatable_fill_fields_for_instance_domains
INFO[0000] starting migration caller="/Users/zach/src/zitadel/internal/migration/migration.go:66" name=repeatable_fill_fields_for_instance_domains
----- SIGTERM signal issued -----
INFO[0000] received interrupt signal, shutting down: interrupt caller="/Users/zach/src/zitadel/cmd/setup/setup.go:121"
INFO[0000] query failed caller="/Users/zach/src/zitadel/internal/eventstore/repository/sql/query.go:135" error="timeout: context already done: context canceled"
DEBU[0000] filter eventstore failed caller="/Users/zach/src/zitadel/internal/eventstore/handler/v2/field_handler.go:155" error="ID=SQL-KyeAx Message=unable to filter events Parent=(timeout: context already done: context canceled)" projection=instance_domain_fields
DEBU[0000] unable to rollback tx caller="/Users/zach/src/zitadel/internal/eventstore/handler/v2/field_handler.go:110" error="sql: transaction has already been committed or rolled back" projection=instance_domain_fields
INFO[0000] process events failed caller="/Users/zach/src/zitadel/internal/eventstore/handler/v2/field_handler.go:72" error="ID=SQL-KyeAx Message=unable to filter events Parent=(timeout: context already done: context canceled)" projection=instance_domain_fields
DEBU[0000] trigger iteration caller="/Users/zach/src/zitadel/internal/eventstore/handler/v2/field_handler.go:73" iteration=0 projection=instance_domain_fields
ERRO[0000] migration failed caller="/Users/zach/src/zitadel/internal/migration/migration.go:68" error="ID=SQL-KyeAx Message=unable to filter events Parent=(timeout: context already done: context canceled)" name=repeatable_fill_fields_for_instance_domains
ERRO[0000] migration finish failed caller="/Users/zach/src/zitadel/internal/migration/migration.go:71" error="context canceled" name=repeatable_fill_fields_for_instance_domains
----- Cleanup before exiting -----
INFO[0000] cleanup started caller="/Users/zach/src/zitadel/cmd/setup/cleanup.go:30"
INFO[0000] cleanup migration caller="/Users/zach/src/zitadel/cmd/setup/cleanup.go:47" name=repeatable_fill_fields_for_instance_domains
```
# Additional Changes
* `mustExecuteMigration` -> `executeMigration`: **must**Execute logged a
Fatal error previously which calls os.Exit so no cleanup was possible.
Instead, this PR returns an error and assigns it to a shared error in
the Setup closure that defer can check.
* `initProjections` now returns an error instead of exiting
# Additional Context
This behavior might be unwelcome or at least unexpected in some cases.
Putting it behind a feature flag or config setting is likely a good
followup.
---------
Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
# Which Problems Are Solved
Zitadel currently uses 3 database pool, 1 for queries, 1 for pushing
events and 1 for scheduled projection updates. This defeats the purpose
of a connection pool which already handles multiple connections.
During load tests we found that the current structure of connection
pools consumes a lot of database resources. The resource usage dropped
after we reduced the amount of database pools to 1 because existing
connections can be used more efficiently.
# How the Problems Are Solved
Removed logic to handle multiple connection pools and use a single one.
# Additional Changes
none
# Additional Context
part of https://github.com/zitadel/zitadel/issues/8352
Even though this is a feature it's released as fix so that we can back port to earlier revisions.
As reported by multiple users startup of ZITADEL after leaded to downtime and worst case rollbacks to the previously deployed version.
The problem starts rising when there are too many events to process after the start of ZITADEL. The root cause are changes on projections (database tables) which must be recomputed. This PR solves this problem by adding a new step to the setup phase which prefills the projections. The step can be enabled by adding the `--init-projections`-flag to `setup`, `start-from-init` and `start-from-setup`. Setting this flag results in potentially longer duration of the setup phase but reduces the risk of the problems mentioned in the paragraph above.
* fix(setup): unmarshal of failed step
* fix(cleanup): cleanup all stuck states
* use lastRun for repeatable steps
* typo
---------
Co-authored-by: Livio Spring <livio.a@gmail.com>
* fix(db): add additional connection pool for projection spooling
* use correct connection pool for projections
---------
Co-authored-by: Livio Spring <livio.a@gmail.com>
This implementation increases parallel write capabilities of the eventstore.
Please have a look at the technical advisories: [05](https://zitadel.com/docs/support/advisory/a10005) and [06](https://zitadel.com/docs/support/advisory/a10006).
The implementation of eventstore.push is rewritten and stored events are migrated to a new table `eventstore.events2`.
If you are using cockroach: make sure that the database user of ZITADEL has `VIEWACTIVITY` grant. This is used to query events.