# Which Problems Are Solved
There were multiple issues in the OpenTelemetry (OTEL) implementation
and usage for tracing and metrics, which lead to high cardinality and
potential memory leaks:
- wrongly initiated tracing interceptors
- high cardinality in traces:
- HTTP/1.1 endpoints containing host names
- HTTP/1.1 endpoints containing object IDs like userID (e.g.
`/management/v1/users/2352839823/`)
- high amount of traces from internal processes (spooler)
- high cardinality in metrics endpoint:
- GRPC entries containing host names
- notification metrics containing instanceIDs and error messages
# How the Problems Are Solved
- Properly initialize the interceptors once and update them to use the
grpc stats handler (unary interceptors were deprecated).
- Remove host names from HTTP/1.1 span names and use path as default.
- Set / overwrite the uri for spans on the grpc-gateway with the uri
pattern (`/management/v1/users/{user_id}`). This is used for spans in
traces and metric entries.
- Created a new sampler which will only sample spans in the following
cases:
- remote was already sampled
- remote was not sampled, root span is of kind `Server` and based on
fraction set in the runtime configuration
- This will prevent having a lot of spans from the spooler back ground
jobs if they were not started by a client call querying an object (e.g.
UserByID).
- Filter out host names and alike from OTEL generated metrics (using a
`view`).
- Removed instance and error messages from notification metrics.
# Additional Changes
Fixed the middleware handling for serving Console. Telemetry and
instance selection are only used for the environment.json, but not on
statically served files.
# Additional Context
- closes#8096
- relates to #9074
- back ports to at least 2.66.x, 2.67.x and 2.68.x
# Which Problems Are Solved
ZITADEL currently selects the instance context based on a HTTP header
(see https://github.com/zitadel/zitadel/issues/8279#issue-2399959845 and
checks it against the list of instance domains. Let's call it instance
or API domain.
For any context based URL (e.g. OAuth, OIDC, SAML endpoints, links in
emails, ...) the requested domain (instance domain) will be used. Let's
call it the public domain.
In cases of proxied setups, all exposed domains (public domains) require
the domain to be managed as instance domain.
This can either be done using the "ExternalDomain" in the runtime config
or via system API, which requires a validation through CustomerPortal on
zitadel.cloud.
# How the Problems Are Solved
- Two new headers / header list are added:
- `InstanceHostHeaders`: an ordered list (first sent wins), which will
be used to match the instance.
(For backward compatibility: the `HTTP1HostHeader`, `HTTP2HostHeader`
and `forwarded`, `x-forwarded-for`, `x-forwarded-host` are checked
afterwards as well)
- `PublicHostHeaders`: an ordered list (first sent wins), which will be
used as public host / domain. This will be checked against a list of
trusted domains on the instance.
- The middleware intercepts all requests to the API and passes a
`DomainCtx` object with the hosts and protocol into the context
(previously only a computed `origin` was passed)
- HTTP / GRPC server do not longer try to match the headers to instances
themself, but use the passed `http.DomainContext` in their interceptors.
- The `RequestedHost` and `RequestedDomain` from authz.Instance are
removed in favor of the `http.DomainContext`
- When authenticating to or signing out from Console UI, the current
`http.DomainContext(ctx).Origin` (already checked by instance
interceptor for validity) is used to compute and dynamically add a
`redirect_uri` and `post_logout_redirect_uri`.
- Gateway passes all configured host headers (previously only did
`x-zitadel-*`)
- Admin API allows to manage trusted domain
# Additional Changes
None
# Additional Context
- part of #8279
- open topics:
- "single-instance" mode
- Console UI
# Which Problems Are Solved
The metric `http_server_return_code_counter` doesn't record calls to the
gRPC gateway.
# How the Problems Are Solved
The DefaultMetricsHandler that is used for the gPRC gateway doesn't
record `http_server_return_code_counter`.
Instead of the DefaultMetricsHandler, a custom metrics handler which
includes `http_server_return_code_counter` is created for the gRPC
gateway
# Additional Changes
The DefaultMetricsHandler function is removed, as it is no longer used.
# Additional Context
Reported by a customer
---------
Co-authored-by: Silvan <silvan.reusser@gmail.com>
# Which Problems Are Solved
We identified some parts in the code, which could panic with a nil
pointer when accessed without auth request.
Additionally, if a GRPC method was called with an unmapped HTTP method,
e.g. POST instead of GET a 501 instead of a 405 was returned.
# How the Problems Are Solved
- Additional checks for existing authRequest
- custom http status code mapper for gateway
# Additional Changes
None.
# Additional Context
- noted internally in OPS
* fix: add https status to activity log
* create prerelease
* create RC
* pass info from gateway to grpc server
* fix: update releaserc to create RC version
* cleanup
---------
Co-authored-by: Livio Spring <livio.a@gmail.com>
* fix: 404 for robots.txt and meta robots tags
* fix: add unit tests for robots txt and tag
* fix: add meta tag robots none for login pages
* fix: weird format issue in header.go
* fix: add x-robots-tag=none to grpcwebserver
* fix linting
---------
Co-authored-by: Silvan <silvan.reusser@gmail.com>
Co-authored-by: Livio Spring <livio.a@gmail.com>
* feat(actions): begin api
* feat(actions): begin api
* api and projections
* fix: handle multiple statements for a single event in projections
* export func type
* fix test
* update to new reduce interface
* flows in login
* feat: jwt idp
* feat: command side
* feat: add tests
* actions and flows
* fill idp views with jwt idps and return apis
* add jwtEndpoint to jwt idp
* begin jwt request handling
* add feature
* merge
* merge
* handle jwt idp
* cleanup
* bug fixes
* autoregister
* get token from specific header name
* fix: proto
* fixes
* i18n
* begin tests
* fix and log http proxy
* remove docker cache
* fixes
* usergrants in actions api
* tests adn cleanup
* cleanup
* fix add user grant
* set login context
* i18n
Co-authored-by: fabi <fabienne.gerschwiler@gmail.com>
* refactor: switch from opencensus to opentelemetry
* tempo works as designed nooooot
* fix: log traceids
* with grafana agent
* fix: http tracing
* fix: cleanup files
* chore: remove todo
* fix: bad test
* fix: ignore methods in grpc interceptors
* fix: remove test log
* clean up
* typo
* fix(config): configure tracing endpoint
* fix(span): add error id to span