# Which Problems Are Solved
1. The projection handler reported no error if an error happened but
updating the current state was successful. This can lead to skipped
projections during setup as soon as the projection has an error but does
not correctly report if to the caller.
2. Mirror projections skipped as soon as an error occures, this leads to
unprojected projections.
3. Mirror checked position wrongly in some cases
# How the Problems Are Solved
1. the error returned by the `Trigger` method will will only be set to
the error of updating current states if there occured an error.
2. triggering projections checks for the error type returned and retries
if the error had code `23505`
3. Corrected to use the `Equal` method
# Additional Changes
unify logging on mirror projections
# Which Problems Are Solved
With the change of #9561, the `mirror` command panics as there's no
metrics provider configured.
# How the Problems Are Solved
Correctly initialize the provider (no-op by default) for the mirror
command.
# Additional Changes
None
# Additional Context
relates to #9561 -> needs backports to 2.66.x - 2.71.x and 3.0.0-rc
Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
# Which Problems Are Solved
ZITADEL allows the use of JSON Web Token (JWT) Profile OAuth 2.0 for Authorization Grants in machine-to-machine (M2M) authentication. Multiple keys can be managed for a single machine account (service user), each with an individual expiry.
A vulnerability existed where expired keys can be used to retrieve tokens. Specifically, ZITADEL fails to properly check the expiration date of the JWT key when used for Authorization Grants. This allows an attacker with an expired key to obtain valid access tokens.
This vulnerability does not affect the use of JWT Profile for OAuth 2.0 Client Authentication on the Token and Introspection endpoints, which correctly reject expired keys.
# How the Problems Are Solved
Added proper validation of the expiry of the stored public key.
# Additional Changes
None
# Additional Context
None
(cherry picked from commit 315503beab)
# Which Problems Are Solved
The username entered by the user was resp. replaced by the stored user's username. This provided a possibility to enumerate usernames as unknown usernames were not normalized.
# How the Problems Are Solved
- Store and display the username as entered by the user.
- Removed the part where the loginname was always set to the user's loginname when retrieving the `nextSteps`
# Additional Changes
None
# Additional Context
None
(cherry picked from commit 14de8ecac2)
# Which Problems Are Solved
With current provided telemetry it's difficult to predict when a
projection handler is under increased load until it's too late and
causes downstream issues. Importantly, projection updating is in the
critical path for many login flows and increased latency there can
result in system downtime for users.
# How the Problems Are Solved
This PR adds three new prometheus-style metrics:
1. **projection_events_processed** (_labels: projection, success_) -
This metric gives us a counter of the number of events processed per
projection update run and whether they we're processed without error. A
high number of events being processed can let us know how busy a
particular projection handler is.
2. **projection_handle_timer** _(labels: projection)_ - This is the time
it takes to process a projection update given a batch of events - time
to take the current_states lock, query for new events, reduce,
update_the projection, and update current_states.
3. **projection_state_latency** _(labels: projection)_ - This is the
time from the last event processed in the current_states table for a
given projection. It tells us how old was the last event you processed?
Or, how far behind are you running for this projection? Higher latencies
could mean high load or stalled projection handling.
# Additional Changes
I also had to initialize the global otel metrics provider (`metrics.M`)
in the `setup` step additionally to `start` since projection handlers
are initialized at setup. The initialization checks if a metrics
provider is already set (in case of `start-from-setup` or
`start-from-init` to prevent overwriting, which causes the otel metrics
provider to stop working.
# Additional Context
## Example Dashboards


---------
Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
Co-authored-by: Livio Spring <livio.a@gmail.com>
(cherry picked from commit c1535b7b49)
# Which Problems Are Solved
Build and test workflows are currently running on specific GitHub hosted
runners. These is not needed for most worklfows and just costs more.
# How the Problems Are Solved
Moved all the steps apart from integration-tests to public runners.
# Additional Changes
None
# Additional Context
None
(cherry picked from commit 3424204291)
# Which Problems Are Solved
The service name is hardcoded in the metrics code. Making the service
name to be configurable helps when running multiple instances of
Zitadel.
The defaults remain unchanged, the service name will be defaulted to
ZITADEL.
# How the Problems Are Solved
Add a config option to override the name in defaults.yaml and pass it
down to the corresponding metrics or tracing module (google or otel)
# Additional Changes
NA
# Additional Context
NA
(cherry picked from commit dc64e35128)
# Which Problems Are Solved
E2E tests in pipelines started to fail randomly. While debugging it, i
noticed that we use the `latest` tag of cockroach's docker image. They
tagged 25.1 as latest yesterday.
# How the Problems Are Solved
Since we drop support for CRDB with version 3 as there are anyway
multiple issues with various versions, I pinned the docker image tag to
`latest-v24.3`.
# Additional Changes
None
# Additional Context
relates to https://github.com/zitadel/zitadel/actions/runs/13917603587
and https://github.com/zitadel/zitadel/actions/runs/13904928050
(cherry picked from commit f1f500d0e7)
# Which Problems Are Solved
Zitadel should not record 404 response counts of unknown paths (check
`/debug/metrics`).
This can lead to high cardinality on metrics endpoint and in traces.
```
GOOD http_server_return_code_counter_total{method="GET",otel_scope_name="",otel_scope_version="",return_code="200",uri="/.well-known/openid-configuration"} 2
GOOD http_server_return_code_counter_total{method="GET",otel_scope_name="",otel_scope_version="",return_code="200",uri="/oauth/v2/keys"} 2
BAD http_server_return_code_counter_total{method="GET",otel_scope_name="",otel_scope_version="",return_code="404",uri="/junk"} 2000
```
After
```
GOOD http_server_return_code_counter_total{method="GET",otel_scope_name="",otel_scope_version="",return_code="200",uri="/.well-known/openid-configuration"} 2
GOOD http_server_return_code_counter_total{method="GET",otel_scope_name="",otel_scope_version="",return_code="200",uri="/oauth/v2/keys"} 2
```
# How the Problems Are Solved
This PR makes sure, that any unknown path is recorded as `UNKNOWN_PATH`
instead of the actual path.
# Additional Changes
N/A
# Additional Context
On our production instance, when a penetration test was run, it caused
our metric count to blow up to many thousands due to Zitadel recording
404 response counts.
Next nice to have steps, remove 404 timer recordings which serve no
purpose
---------
Co-authored-by: Livio Spring <livio.a@gmail.com>
Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
Co-authored-by: Livio Spring <livio@zitadel.com>
(cherry picked from commit 599850e7e8)
# Which Problems Are Solved
When using a custom / new login UI and an OIDC application with
registered BackChannelLogoutUI, no logout requests were sent to the URI
when the user signed out.
Additionally, as described in #9427, an error was logged:
`level=error msg="event of type *session.TerminateEvent doesn't
implement OriginEvent"
caller="/home/runner/work/zitadel/zitadel/internal/notification/handlers/origin.go:24"`
# How the Problems Are Solved
- Properly pass `TriggerOrigin` information to session.TerminateEvent
creation and implement `OriginEvent` interface.
- Implemented `RegisterLogout` in `CreateOIDCSessionFromAuthRequest` and
`CreateOIDCSessionFromDeviceAuth`, both used when interacting with the
OIDC v2 API.
- Both functions now receive the `BackChannelLogoutURI` of the client
from the OIDC layer.
# Additional Changes
None
# Additional Context
- closes#9427
(cherry picked from commit ed697bbd69)
# Which Problems Are Solved
When requesting a JWT (`urn:ietf:params:oauth:token-type:jwt`) to be
returned in a Token Exchange request, ZITADEL would panic if the `actor`
was not granted the necessary permission.
# How the Problems Are Solved
Properly check the error and return it.
# Additional Changes
None
# Additional Context
- closes#9436
(cherry picked from commit e6ce1af003)
# Which Problems Are Solved
ZITADEL's Admin API, intended for managing ZITADEL instances, contains 12 HTTP endpoints that are unexpectedly accessible to authenticated ZITADEL users who are not ZITADEL managers. The most critical vulnerable endpoints relate to LDAP configuration:
- /idps/ldap
- /idps/ldap/{id}
By accessing these endpoints, unauthorized users could:
- Modify ZITADEL's instance LDAP settings, redirecting all LDAP login attempts to a malicious server, effectively taking over user accounts.
- Expose the original LDAP server's password, potentially compromising all user accounts.
The following endpoints are also affected by IDOR vulnerabilities, potentially allowing unauthorized modification of instance settings such as languages, labels, and templates:
- /idps/templates/_search
- /idps/templates/{id}
- /policies/label/_activate
- /policies/label/logo
- /policies/label/logo_dark
- /policies/label/icon
- /policies/label/icon_dark
- /policies/label/font
- /text/message/passwordless_registration/{language}
- /text/login/{language}
Please checkout https://github.com/zitadel/zitadel/security/advisories/GHSA-f3gh-529w-v32x for more information.
# How the Problems Are Solved
- Required permission have been fixed (only instance level allowed)
# Additional Changes
None
# Additional Context
- resolves https://github.com/zitadel/zitadel/security/advisories/GHSA-f3gh-529w-v32x
(cherry picked from commit d9d8339813)
# Which Problems Are Solved
As reported in #9311, even when providing a `x-zitadel-login-client`
header, the auth request would be created as hosted login UI / V1
request.
This is due to a change introduced with #9071, where the login UI
version can be specified using the app configuration.
The configuration set to V1 was not considering if the header was sent.
# How the Problems Are Solved
- Check presence of `x-zitadel-login-client` before the configuration.
Use later only if no header is set.
# Additional Changes
None
# Additional Context
- closes#9311
- needs back ports to 2.67.x, 2.68.x and 2.69.x
(cherry picked from commit e7a73eb6b1)
# Which Problems Are Solved
Systems running with PostgreSQL before Zitadel v2.39 are likely to have
a wrong type for the `in_tx_order` column in the `eventstore.event2`
table. The migration at the time used the `event_sequence` as default
value without typecast, which results in a `bigint` type for that
column. However, when creating the table from scratch, we explicitly
specify the type to be `integer`.
Starting from Zitadel v2.67 we use a Pl/PgSQL function to push events.
The function requires the types from `eventstore.events2` to the same as
the `select` destinations used in the function. In the function
`in_tx_order` is also expected to by of `integer` type.
CochroachDB systems are not affected because `bigint` is an alias to the
`int` type. In other words, CockroachDB uses `int8` when specifying type
`int`. Therefore the types already match.
# How the Problems Are Solved
Retrieve the actual column type currently in use. A template is used to
assign the type to the `ordinality` column returned as `in_tx_order`.
# Additional Changes
- Detailed logging on migration failure
# Additional Context
- Closes#9180
---------
Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
(cherry picked from commit bcc6a689fa)
# Which Problems Are Solved
There were multiple issues in the OpenTelemetry (OTEL) implementation
and usage for tracing and metrics, which lead to high cardinality and
potential memory leaks:
- wrongly initiated tracing interceptors
- high cardinality in traces:
- HTTP/1.1 endpoints containing host names
- HTTP/1.1 endpoints containing object IDs like userID (e.g.
`/management/v1/users/2352839823/`)
- high amount of traces from internal processes (spooler)
- high cardinality in metrics endpoint:
- GRPC entries containing host names
- notification metrics containing instanceIDs and error messages
# How the Problems Are Solved
- Properly initialize the interceptors once and update them to use the
grpc stats handler (unary interceptors were deprecated).
- Remove host names from HTTP/1.1 span names and use path as default.
- Set / overwrite the uri for spans on the grpc-gateway with the uri
pattern (`/management/v1/users/{user_id}`). This is used for spans in
traces and metric entries.
- Created a new sampler which will only sample spans in the following
cases:
- remote was already sampled
- remote was not sampled, root span is of kind `Server` and based on
fraction set in the runtime configuration
- This will prevent having a lot of spans from the spooler back ground
jobs if they were not started by a client call querying an object (e.g.
UserByID).
- Filter out host names and alike from OTEL generated metrics (using a
`view`).
- Removed instance and error messages from notification metrics.
# Additional Changes
Fixed the middleware handling for serving Console. Telemetry and
instance selection are only used for the environment.json, but not on
statically served files.
# Additional Context
- closes#8096
- relates to #9074
- back ports to at least 2.66.x, 2.67.x and 2.68.x
(cherry picked from commit 990e1982c7)
# Which Problems Are Solved
#9185 changed that if a notification channel was not present,
notification workers would no longer retry to send the notification and
would also cancel in case Twilio would return a 4xx error.
However, this would not affect the "legacy" mode.
# How the Problems Are Solved
- Handle `CancelError` in legacy notifier as not failed (event).
# Additional Changes
None
# Additional Context
- relates to #9185
- requires back port to 2.66.x and 2.67.x
(cherry picked from commit 3fc68e5d60)
# Which Problems Are Solved
https://github.com/zitadel/zitadel/pull/9186 introduced the new `push`
sql function for cockroachdb. The function used the wrong database
function to generate the position of the event and would therefore
insert events at a position before events created with an old Zitadel
version.
# How the Problems Are Solved
Instead of `EXTRACT(EPOCH FROM NOW())`, `cluster_logical_timestamp()` is
used to calculate the position of an event.
# Additional Context
- Introduced in https://github.com/zitadel/zitadel/pull/9186
- Affected versions:
https://github.com/zitadel/zitadel/releases/tag/v2.67.3
# Which Problems Are Solved
If a notification channel was not present, notification workers would
retry to the max attempts. This leads to unnecessary load.
Additionally, a client noticed bad actors trying to abuse SMS MFA.
# How the Problems Are Solved
- Directly cancel the notification on:
- a missing channel and stop retries.
- any `4xx` errors from Twilio Verify
# Additional Changes
None
# Additional Context
reported by customer
(cherry picked from commit 60857c8d3e)
# Which Problems Are Solved
The performance of the initial push function can further be increased
# How the Problems Are Solved
`eventstore.push`- and `eventstore.commands_to_events`-functions were
rewritten
# Additional Changes
none
# Additional Context
same optimizations as for postgres:
https://github.com/zitadel/zitadel/pull/9092
(cherry picked from commit 690147b30e)
# Which Problems Are Solved
Organization name change results in domain events even if the domain
itself doesn't change.
# How the Problems Are Solved
Check if the domain itself really changes, and if not, don't create the
events.
# Additional Changes
Unittest for this specific case.
# Additional Context
None
(cherry picked from commit 69372e5209)
# Which Problems Are Solved
Events like "password check succeeded" store some information about the
caller including their IP.
The `X-Forwarded-For` was not correctly logged, but instead the
RemoteAddress.
# How the Problems Are Solved
- Correctly get the `X-Forwarded-For` in canonical form.
# Additional Changes
None
# Additional Context
closes [#9106](https://github.com/zitadel/zitadel/issues/9106)
(cherry picked from commit c966446f80)
# Which Problems Are Solved
It was possible to set a diffent algorithm for the legacy signer. This
is not supported howerver and breaks the token endpoint.
# How the Problems Are Solved
Remove the OIDC.SigningKeyAlgorithm config option and hard-code RS256
for the legacy signer.
# Additional Changes
- none
# Additional Context
Only RS256 is supported by the legacy signer. It was mentioned in the
comment of the config not to use it and use the webkeys resource
instead.
- closes#9121
(cherry picked from commit db8d794794)
# Which Problems Are Solved
Typo in RU localization on login page.
# How the Problems Are Solved
Fixed typo by replacing to correct text.
# Additional Changes
n/a
# Additional Context
n/a
Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
(cherry picked from commit 42cc6dce79)
# Which Problems Are Solved
If a not allowed IDP is selected or now not allowed IDP was selected
before at login, the login will still try to use it as fallback.
The same goes for the linked IDPs which are not necessarily active
anymore, or disallowed through policies.
# How the Problems Are Solved
Check all possible or configured IDPs if they can be used.
# Additional Changes
None
# Additional Context
Addition to #6466
---------
Co-authored-by: Livio Spring <livio.a@gmail.com>
(cherry picked from commit 8d8f38fb4c)
# Which Problems Are Solved
We were seeing high query costs in a the lateral join executed in the
commands_to_events procedural function in the database. The high cost
resulted in incremental CPU usage as a load test continued and less
req/sec handled, sarting at 836 and ending at 130 req/sec.
# How the Problems Are Solved
1. Set `PARALLEL SAFE`. I noticed that this option defaults to `UNSAFE`.
But it's actually safe if the function doesn't `INSERT`
2. Set the returned `ROWS 10` parameter.
3. Function is re-written in Pl/PgSQL so that we eliminate expensive
joins.
4. Introduced an intermediate state that does `SELECT DISTINCT` for the
aggregate so that we don't have to do an expensive lateral join.
# Additional Changes
Use a `COALESCE` to get the owner from the last event, instead of a
`CASE` switch.
# Additional Context
- Function was introduced in
https://github.com/zitadel/zitadel/pull/8816
- Closes https://github.com/zitadel/zitadel/issues/8352
---------
Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
# Which Problems Are Solved
On Zitadel cloud we found changing the order of columns in the
`eventstore.events2_current_sequence` index improved CPU usage for the
`SELECT ... FOR UPDATE` query the pusher executes.
# How the Problems Are Solved
`eventstore.events2_current_sequence`-index got replaced
# Additional Context
closes https://github.com/zitadel/zitadel/issues/9082
get instance by domain cannot provide an instance id because it is not
known at that time. This causes a full table scan on the fields table
because current indexes always include the `instance_id` column.
Added a specific index for this query.
If a system has many fields and there is no cache hit for the given
domain this query can heaviuly influence database CPU usage, the newly
added resolves this problem.
(cherry picked from commit f320d18b1a)
# Which Problems Are Solved
When `LastUseAge` was configured properly, the Redis LUA script uses
manual cleanup for `MaxAge` based expiry. The expiry obtained from Redis
apears to be a string and was compared to an int, resulting in a script
error.
# How the Problems Are Solved
Convert expiry to number.
# Additional Changes
- none
# Additional Context
- Introduced in #8822
- LastUseAge was fixed in #9097
- closes https://github.com/zitadel/zitadel/issues/9140
(cherry picked from commit 56427cca50)
# Which Problems Are Solved
IdPs using form callback were not always correctly handled with the
newly introduced cache mechanism
(https://github.com/zitadel/zitadel/pull/9097).
# How the Problems Are Solved
Get the data from cache before parsing it.
# Additional Changes
None
# Additional Context
Relates to https://github.com/zitadel/zitadel/pull/9097
# Which Problems Are Solved
Some IdP callbacks use HTTP form POST to return their data on callbacks.
For handling CSRF in the login after such calls, a 302 Found to the
corresponding non form callback (in ZITADEL) is sent. Depending on the
size of the initial form body, this could lead to ZITADEL terminating
the connection, resulting in the user not getting a response or an
intermediate proxy to return them an HTTP 502.
# How the Problems Are Solved
- the form body is parsed and stored into the ZITADEL cache (using the
configured database by default)
- the redirect (302 Found) is performed with the request id
- the callback retrieves the data from the cache instead of the query
parameters (will fallback to latter to handle open uncached requests)
# Additional Changes
- fixed a typo in the default (cache) configuration: `LastUsage` ->
`LastUseAge`
# Additional Context
- reported by a customer
- needs to be backported to current cloud version (2.66.x)
---------
Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
(cherry picked from commit fa5e590aab)
# Which Problems Are Solved
Some IdP callbacks use HTTP form POST to return their data on callbacks.
For handling CSRF in the login after such calls, a 302 Found to the
corresponding non form callback (in ZITADEL) is sent. Depending on the
size of the initial form body, this could lead to ZITADEL terminating
the connection, resulting in the user not getting a response or an
intermediate proxy to return them an HTTP 502.
# How the Problems Are Solved
- the form body is parsed and stored into the ZITADEL cache (using the
configured database by default)
- the redirect (302 Found) is performed with the request id
- the callback retrieves the data from the cache instead of the query
parameters (will fallback to latter to handle open uncached requests)
# Additional Changes
- fixed a typo in the default (cache) configuration: `LastUsage` ->
`LastUseAge`
# Additional Context
- reported by a customer
- needs to be backported to current cloud version (2.66.x)
---------
Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
# Which Problems Are Solved
SAML IdPs exposing an `EntitiesDescriptor` using an `xsd:duration` time
format for the `cacheDuration` property (e.g. `PT5H`) failed parsing.
# How the Problems Are Solved
Handle the unmarshalling for `EntitiesDescriptor` specifically.
[crewjam/saml](bbccb7933d/metadata.go (L88-L103))
already did this for `EntitiyDescriptor` the same way.
# Additional Changes
None
# Additional Context
- reported by a customer
- needs to be backported to current cloud version (2.66.x)
(cherry picked from commit bcf416d4cf)
# Which Problems Are Solved
Commands for installing compose stacks with reverse proxies don't work.
# How the Problems Are Solved
- The `docker compose up` commands are fixed by specifying all necessary
services to spin up. This is obviously not (or not with all docker
compose versions) resolved by the dependencies declarations.
- The initial postgres admin username is postgres.
- Fix postgres health check to succeed before the init job created the
DB.
- A hint tells the user to install the grpcurl binary.
# Additional Changes
- Passing `--wait` to `docker compose up` doesn't require us to sleep
for exactly three seconds.
- It looks to me like the order of the depends_on declaration for
zitadel matters, but I don't understand why. I changed it so that it's
for sure correct.
- Silenced some command outputs
- Removed the version property from all compose files to avoid the
following warning
```
WARN[0000] /tmp/caddy-example/docker-compose-base.yaml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion
```
# Additional Context
- Closes https://github.com/zitadel/zitadel/issues/9115
This is the easiest way to test the updated docs:
```bash
# Use this PR branches files:
export ZITADEL_CONFIG_FILES=https://raw.githubusercontent.com/zitadel/zitadel/refs/heads/fix-reverse-proxy-guides/docs/docs/self-hosting/manage/reverseproxy
```
The rest of the commands as described in
https://docs-git-fix-reverse-proxy-guides-zitadel.vercel.app/docs/self-hosting/manage/reverseproxy/caddy

# Which Problems Are Solved
The v2 api currently has no endpoint the get all second factors of a
user.
# How the Problems Are Solved
Our v1 api has the ListHumanAuthFactors which got added to the v2 api
under the User resource.
# Additional Changes
# Additional Context
Closes#8833
---------
Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
# Which Problems Are Solved
Added search by phone to user Service V2.
```
curl --request POST \
--url https://<zitadel_domain>/v2/users \
--header 'Accept: application/json' \
--header 'Authorization: Bearer <Token>' \
--header 'Content-Type: application/json' \
--header 'content-type: application/json' \
--data '{
"query": {
"offset": "0",
"limit": 100,
"asc": true
},
"sortingColumn": "USER_FIELD_NAME_UNSPECIFIED",
"queries": [
{
"phoneQuery": {
"number": "+12011223313",
"method": "TEXT_QUERY_METHOD_EQUALS"
}
}
]
}'
```
Why?
Searching for a user by phone was missing from User Service V2 and V2
beta.
# How the Problems Are Solved
* Added to the SearchQuery proto
* Added code to filter users by phone
# Additional Changes
N/A
# Additional Context
Search by phone is present in V3 User Service
---------
Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
# Which Problems Are Solved
There is currently no endpoint to send an email code for verification of
the email if you don't change the email itself.
# How the Problems Are Solved
Endpoint HasEmailCode to get the information that an email code is
existing, used by the new login.
Endpoint SendEmailCode, if no code is existing to replace
ResendEmailCode as there is a check that a code has to be there, before
it can be resend.
# Additional Changes
None
# Additional Context
Closes#9096
---------
Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
# Which Problems Are Solved
SAML IdPs exposing an `EntitiesDescriptor` using an `xsd:duration` time
format for the `cacheDuration` property (e.g. `PT5H`) failed parsing.
# How the Problems Are Solved
Handle the unmarshalling for `EntitiesDescriptor` specifically.
[crewjam/saml](bbccb7933d/metadata.go (L88-L103))
already did this for `EntitiyDescriptor` the same way.
# Additional Changes
None
# Additional Context
- reported by a customer
- needs to be backported to current cloud version (2.66.x)