# Which Problems Are Solved
There were multiple issues in the OpenTelemetry (OTEL) implementation
and usage for tracing and metrics, which lead to high cardinality and
potential memory leaks:
- wrongly initiated tracing interceptors
- high cardinality in traces:
- HTTP/1.1 endpoints containing host names
- HTTP/1.1 endpoints containing object IDs like userID (e.g.
`/management/v1/users/2352839823/`)
- high amount of traces from internal processes (spooler)
- high cardinality in metrics endpoint:
- GRPC entries containing host names
- notification metrics containing instanceIDs and error messages
# How the Problems Are Solved
- Properly initialize the interceptors once and update them to use the
grpc stats handler (unary interceptors were deprecated).
- Remove host names from HTTP/1.1 span names and use path as default.
- Set / overwrite the uri for spans on the grpc-gateway with the uri
pattern (`/management/v1/users/{user_id}`). This is used for spans in
traces and metric entries.
- Created a new sampler which will only sample spans in the following
cases:
- remote was already sampled
- remote was not sampled, root span is of kind `Server` and based on
fraction set in the runtime configuration
- This will prevent having a lot of spans from the spooler back ground
jobs if they were not started by a client call querying an object (e.g.
UserByID).
- Filter out host names and alike from OTEL generated metrics (using a
`view`).
- Removed instance and error messages from notification metrics.
# Additional Changes
Fixed the middleware handling for serving Console. Telemetry and
instance selection are only used for the environment.json, but not on
statically served files.
# Additional Context
- closes#8096
- relates to #9074
- back ports to at least 2.66.x, 2.67.x and 2.68.x
(cherry picked from commit 990e1982c712ba2082f3fc6fc4861f3abf85b0cd)
# Which Problems Are Solved
Use a single server instance for API integration tests. This optimizes
the time taken for the integration test pipeline,
because it allows running tests on multiple packages in parallel. Also,
it saves time by not start and stopping a zitadel server for every
package.
# How the Problems Are Solved
- Build a binary with `go build -race -cover ....`
- Integration tests only construct clients. The server remains running
in the background.
- The integration package and tested packages now fully utilize the API.
No more direct database access trough `query` and `command` packages.
- Use Makefile recipes to setup, start and stop the server in the
background.
- The binary has the race detector enabled
- Init and setup jobs are configured to halt immediately on race
condition
- Because the server runs in the background, races are only logged. When
the server is stopped and race logs exist, the Makefile recipe will
throw an error and print the logs.
- Makefile recipes include logic to print logs and convert coverage
reports after the server is stopped.
- Some tests need a downstream HTTP server to make requests, like quota
and milestones. A new `integration/sink` package creates an HTTP server
and uses websockets to forward HTTP request back to the test packages.
The package API uses Go channels for abstraction and easy usage.
# Additional Changes
- Integration test files already used the `//go:build integration`
directive. In order to properly split integration from unit tests,
integration test files need to be in a `integration_test` subdirectory
of their package.
- `UseIsolatedInstance` used to overwrite the `Tester.Client` for each
instance. Now a `Instance` object is returned with a gRPC client that is
connected to the isolated instance's hostname.
- The `Tester` type is now `Instance`. The object is created for the
first instance, used by default in any test. Isolated instances are also
`Instance` objects and therefore benefit from the same methods and
values. The first instance and any other us capable of creating an
isolated instance over the system API.
- All test packages run in an Isolated instance by calling
`NewInstance()`
- Individual tests that use an isolated instance use `t.Parallel()`
# Additional Context
- Closes#6684
- https://go.dev/doc/articles/race_detector
- https://go.dev/doc/build-cover
---------
Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
# Which Problems Are Solved
The current v3alpha actions APIs don't exactly adhere to the [new
resources API
design](https://zitadel.com/docs/apis/v3#standard-resources).
# How the Problems Are Solved
- **Improved ID access**: The aggregate ID is added to the resource
details object, so accessing resource IDs and constructing proto
messages for resources is easier
- **Explicit Instances**: Optionally, the instance can be explicitly
given in each request
- **Pagination**: A default search limit and a max search limit are
added to the defaults.yaml. They apply to the new v3 APIs (currently
only actions). The search query defaults are changed to ascending by
creation date, because this makes the pagination results the most
deterministic. The creation date is also added to the object details.
The bug with updated creation dates is fixed for executions and targets.
- **Removed Sequences**: Removed Sequence from object details and
ProcessedSequence from search details
# Additional Changes
Object details IDs are checked in unit test only if an empty ID is
expected. Centralizing the details check also makes this internal object
more flexible for future evolutions.
# Additional Context
- Closes#8169
- Depends on https://github.com/zitadel/zitadel/pull/8225
---------
Co-authored-by: Silvan <silvan.reusser@gmail.com>
Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
# Which Problems Are Solved
ZITADEL currently selects the instance context based on a HTTP header
(see https://github.com/zitadel/zitadel/issues/8279#issue-2399959845 and
checks it against the list of instance domains. Let's call it instance
or API domain.
For any context based URL (e.g. OAuth, OIDC, SAML endpoints, links in
emails, ...) the requested domain (instance domain) will be used. Let's
call it the public domain.
In cases of proxied setups, all exposed domains (public domains) require
the domain to be managed as instance domain.
This can either be done using the "ExternalDomain" in the runtime config
or via system API, which requires a validation through CustomerPortal on
zitadel.cloud.
# How the Problems Are Solved
- Two new headers / header list are added:
- `InstanceHostHeaders`: an ordered list (first sent wins), which will
be used to match the instance.
(For backward compatibility: the `HTTP1HostHeader`, `HTTP2HostHeader`
and `forwarded`, `x-forwarded-for`, `x-forwarded-host` are checked
afterwards as well)
- `PublicHostHeaders`: an ordered list (first sent wins), which will be
used as public host / domain. This will be checked against a list of
trusted domains on the instance.
- The middleware intercepts all requests to the API and passes a
`DomainCtx` object with the hosts and protocol into the context
(previously only a computed `origin` was passed)
- HTTP / GRPC server do not longer try to match the headers to instances
themself, but use the passed `http.DomainContext` in their interceptors.
- The `RequestedHost` and `RequestedDomain` from authz.Instance are
removed in favor of the `http.DomainContext`
- When authenticating to or signing out from Console UI, the current
`http.DomainContext(ctx).Origin` (already checked by instance
interceptor for validity) is used to compute and dynamically add a
`redirect_uri` and `post_logout_redirect_uri`.
- Gateway passes all configured host headers (previously only did
`x-zitadel-*`)
- Admin API allows to manage trusted domain
# Additional Changes
None
# Additional Context
- part of #8279
- open topics:
- "single-instance" mode
- Console UI
* feat: improve instance not found error
* unit tests
* check if is templatable
* lint
* assert
* compile tests
* remove error templates
* link to instance not found page
* fmt
* cleanup
* lint
* define roles and permissions
* support system user memberships
* don't limit system users
* cleanup permissions
* restrict memberships to aggregates
* default to SYSTEM_OWNER
* update unit tests
* test: system user token test (#6778)
* update unit tests
* refactor: make authz testable
* move session constants
* cleanup
* comment
* comment
* decode member type string to enum (#6780)
* decode member type string to enum
* handle all membership types
* decode enums where necessary
* decode member type in steps config
* update system api docs
* add technical advisory
* tweak docs a bit
* comment in comment
* lint
* extract token from Bearer header prefix
* review changes
* fix tests
* fix: add fix for activityhandler
* add isSystemUser
* remove IsSystemUser from activity info
* fix: add fix for activityhandler
---------
Co-authored-by: Stefan Benz <stefan@caos.ch>
* feat: add activity logs on user actions with authentication, resourceAPI and sessionAPI
* feat: add activity logs on user actions with authentication, resourceAPI and sessionAPI
* feat: add activity logs on user actions with authentication, resourceAPI and sessionAPI
* feat: add activity logs on user actions with authentication, resourceAPI and sessionAPI
* feat: add activity logs on user actions with authentication, resourceAPI and sessionAPI
* fix: add unit tests to info package for context changes
* fix: add activity_interceptor.go suggestion
Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
* fix: refactoring and fixes through PR review
* fix: add auth service to lists of resourceAPIs
---------
Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
Co-authored-by: Fabi <fabienne@zitadel.com>
* chore(proto): update versions
* change protoc plugin
* some cleanups
* define api for setting emails in new api
* implement user.SetEmail
* move SetEmail buisiness logic into command
* resuse newCryptoCode
* command: add ChangeEmail unit tests
Not complete, was not able to mock the generator.
* Revert "resuse newCryptoCode"
This reverts commit c89e90ae35ae924a3f706a0a7394f933910c2e65.
* undo change to crypto code generators
* command: use a generator so we can test properly
* command: reorganise ChangeEmail
improve test coverage
* implement VerifyEmail
including unit tests
* add URL template tests
* proto: change context to object
* remove old auth option
* remove old auth option
* fix linting errors
run gci on modified files
* add permission checks and fix some errors
* comments
* comments
---------
Co-authored-by: Livio Spring <livio.a@gmail.com>
Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
* refactor: switch from opencensus to opentelemetry
* tempo works as designed nooooot
* fix: log traceids
* with grafana agent
* fix: http tracing
* fix: cleanup files
* chore: remove todo
* fix: bad test
* fix: ignore methods in grpc interceptors
* fix: remove test log
* clean up
* typo
* fix(config): configure tracing endpoint
* fix(span): add error id to span