zitadel

mirror of https://github.com/zitadel/zitadel.git synced 2025-07-13 17:28:34 +00:00

Author	SHA1	Message	Date
Zach Hirschtritt	c1535b7b49	feat: add prometheus metrics on projection handlers (#9561 ) # Which Problems Are Solved With current provided telemetry it's difficult to predict when a projection handler is under increased load until it's too late and causes downstream issues. Importantly, projection updating is in the critical path for many login flows and increased latency there can result in system downtime for users. # How the Problems Are Solved This PR adds three new prometheus-style metrics: 1. projection_events_processed (_labels: projection, success_) - This metric gives us a counter of the number of events processed per projection update run and whether they we're processed without error. A high number of events being processed can let us know how busy a particular projection handler is. 2. projection_handle_timer _(labels: projection)_ - This is the time it takes to process a projection update given a batch of events - time to take the current_states lock, query for new events, reduce, update_the projection, and update current_states. 3. projection_state_latency _(labels: projection)_ - This is the time from the last event processed in the current_states table for a given projection. It tells us how old was the last event you processed? Or, how far behind are you running for this projection? Higher latencies could mean high load or stalled projection handling. # Additional Changes I also had to initialize the global otel metrics provider (`metrics.M`) in the `setup` step additionally to `start` since projection handlers are initialized at setup. The initialization checks if a metrics provider is already set (in case of `start-from-setup` or `start-from-init` to prevent overwriting, which causes the otel metrics provider to stop working. # Additional Context ## Example Dashboards ![image](https://github.com/user-attachments/assets/94ba5c2b-9c62-44cd-83ee-4db4a8859073) ![image](https://github.com/user-attachments/assets/60a1b406-a8c6-48dc-a925-575359f97e1e) --------- Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com> Co-authored-by: Livio Spring <livio.a@gmail.com>	2025-03-27 07:40:27 +01:00
Harsha Reddy	dc64e35128	feat: Make service name configurable for Metrics and Tracing (#9563 ) # Which Problems Are Solved The service name is hardcoded in the metrics code. Making the service name to be configurable helps when running multiple instances of Zitadel. The defaults remain unchanged, the service name will be defaulted to ZITADEL. # How the Problems Are Solved Add a config option to override the name in defaults.yaml and pass it down to the corresponding metrics or tracing module (google or otel) # Additional Changes NA # Additional Context NA	2025-03-20 09:35:54 +00:00
Livio Spring	990e1982c7	fix(OTEL): reduce high cardinality in traces and metrics (#9286 ) # Which Problems Are Solved There were multiple issues in the OpenTelemetry (OTEL) implementation and usage for tracing and metrics, which lead to high cardinality and potential memory leaks: - wrongly initiated tracing interceptors - high cardinality in traces: - HTTP/1.1 endpoints containing host names - HTTP/1.1 endpoints containing object IDs like userID (e.g. `/management/v1/users/2352839823/`) - high amount of traces from internal processes (spooler) - high cardinality in metrics endpoint: - GRPC entries containing host names - notification metrics containing instanceIDs and error messages # How the Problems Are Solved - Properly initialize the interceptors once and update them to use the grpc stats handler (unary interceptors were deprecated). - Remove host names from HTTP/1.1 span names and use path as default. - Set / overwrite the uri for spans on the grpc-gateway with the uri pattern (`/management/v1/users/{user_id}`). This is used for spans in traces and metric entries. - Created a new sampler which will only sample spans in the following cases: - remote was already sampled - remote was not sampled, root span is of kind `Server` and based on fraction set in the runtime configuration - This will prevent having a lot of spans from the spooler back ground jobs if they were not started by a client call querying an object (e.g. UserByID). - Filter out host names and alike from OTEL generated metrics (using a `view`). - Removed instance and error messages from notification metrics. # Additional Changes Fixed the middleware handling for serving Console. Telemetry and instance selection are only used for the environment.json, but not on statically served files. # Additional Context - closes #8096 - relates to #9074 - back ports to at least 2.66.x, 2.67.x and 2.68.x	2025-02-04 09:55:26 +01:00
Livio Spring	c8e2a3bd49	feat: enable application performance profiling (#8442 ) # Which Problems Are Solved To have more insight on the performance, CPU and memory usage of ZITADEL, we want to enable profiling. # How the Problems Are Solved - Allow profiling by configuration. - Provide Google Cloud Profiler as first implementation # Additional Changes None. # Additional Context There were possible memory leaks reported: https://discord.com/channels/927474939156643850/1273210227918897152 Co-authored-by: Silvan <silvan.reusser@gmail.com>	2024-08-16 13:26:53 +00:00
Stefan Benz	7d2d85f57c	feat: api v2beta to api v2 (#8283 ) # Which Problems Are Solved The v2beta services are stable but not GA. # How the Problems Are Solved The v2beta services are copied to v2. The corresponding v1 and v2beta services are deprecated. # Additional Context Closes #7236 --------- Co-authored-by: Elio Bischof <elio@zitadel.com>	2024-07-26 22:39:55 +02:00
Joakim Lodén	1d13d41139	fix: remove duplicate otel span processors (#8104 )	2024-06-12 10:18:48 +00:00
Tim Möhlmann	25ef3da9d5	refactor(fmt): run gci on complete project (#7557 ) chore(fmt): run gci on complete project Fix global import formatting in go code by running the `gci` command. This allows us to just use the command directly, instead of fixing the import order manually for the linter, on each PR. Co-authored-by: Elio Bischof <elio@zitadel.com>	2024-04-03 10:43:43 +00:00
Tim Möhlmann	f680dd934d	refactor: rename package errors to zerrors (#7039 ) * chore: rename package errors to zerrors * rename package errors to gerrors * fix error related linting issues * fix zitadel error assertion * fix gosimple linting issues * fix deprecated linting issues * resolve gci linting issues * fix import structure --------- Co-authored-by: Elio Bischof <elio@zitadel.com>	2023-12-08 15:30:55 +01:00
Tim Möhlmann	87cdd20d72	fix(deps): upgrade oidc and otel (#6468 )	2023-09-01 10:32:13 +00:00
Elio Bischof	923f691d77	fix: use singleton meter provider (#5725 )	2023-04-25 18:15:32 +00:00
Silvan	698f46fe6a	chore: update dependencies (#5401 ) * chore(backend): update dependencies * chore(pipeline): update golangci-lint	2023-04-06 06:29:55 +00:00
Silvan	e38abdcdf3	perf: query data `AS OF SYSTEM TIME` (#5231 ) Queries the data in the storage layser at the timestamp when the call hit the API layer	2023-02-27 22:36:43 +01:00
Livio Spring	9b6dad18cb	feat: provide metrics endpoint (#3902 ) * feat: provide metrics endpoint * config * enable otel metrics by default Co-authored-by: Florian Forster <florian@caos.ch>	2022-07-18 10:42:32 +02:00
Livio Amstutz	cf6f4d6894	fix(tracing): parsing of fraction (#3705 ) * fix(tracing): parsing of fraction * log id	2022-05-24 09:18:25 +00:00
Livio Amstutz	2af3e228e4	feat: set service name in tracing (#3533 )	2022-04-28 17:35:56 +02:00
Livio Amstutz	44a2b81bef	feat: enable tracing (#3528 )	2022-04-28 14:44:13 +02:00
Florian Forster	fa9f581d56	chore(v2): move to new org (#3499 ) * chore: move to new org * logging * fix: org rename caos -> zitadel Co-authored-by: adlerhurst <silvan.reusser@gmail.com>	2022-04-26 23:01:45 +00:00
Silvan	66093efdaa	chore(modules): update dependencies (#2691 ) * chore(queries): test suite for prepare stmt funcs * test(queries): prepare project funcs * refactor: add comments * test: simlify expected sql, added possibility to add args to expected queries * test(queries): prepare funcs in org * chore(backend): correct modules * test(queries): org domain prepare funcs * test: correct name * refactor: file name * refactor: add table to login policy columns * chore(prepare_test): only add row to result if columns * test(queries): login policy prepare funcs * chore: add comments for configs * test(queries): prepare idp funcs * fix(queries): add table to password complexity policy cols * test(queries): password complexity policy prepare funcs * fix(queries): add table to password age policy cols * test(queries): password age policy prepare func * fix(queries): set cols on lockout policy * test(queries): lockout policy prepare funs * fix(queries): set table on privacy policy cols * test(queries): privacy policy prepare funcs * fix(queries): set table on org iam policy cols * fix(queries): correct table in org iam policy cols * test(queries): org iam policy prepare funcs * test(queries): prepare project grant funcs * refactor(queries): prepareProjectRoleQuery as func * test(queries): prepare project role funcs * test(queries): project grant check for nulls in joins * fix(queries): allow null values in project grant * refactor(queries): make toQuery private * test(queries): action prepare funcs * refactor: rename prepareFlowQuery to prepareFlowsQuery * test: generic count only if count in cols * refactor: remove param in prepareFlowQuery * fix(queries): correct left joins in action flows * test(queries): action flow prepare funcs * chore(modules): update modules	2021-11-16 13:39:56 +00:00
Silvan	bdf63800f7	update modules (#2534 )	2021-10-21 20:41:37 +02:00
Silvan	30153cff39	chore(gomod): update otel to 1.0.0 (#2414 )	2021-09-23 12:50:17 +02:00
Silvan	c8dd64cbb4	chore(backend): update dependencies (#2308 ) * docker dependencies * update mod * update modules * update otel to rc3 * fix metrics constructors * chore(gomod): update dependencies * update protoc and gatway in dockerfile * operator has issues with 1.17	2021-09-21 14:58:26 +02:00
Silvan	8609ced24b	fix(build): update go version to 1.16 and dependencies (#2136 ) * chore(deps): bump k8s.io/apiextensions-apiserver from 0.19.2 to 0.21.3 Bumps [k8s.io/apiextensions-apiserver](https://github.com/kubernetes/apiextensions-apiserver) from 0.19.2 to 0.21.3. - [Release notes](https://github.com/kubernetes/apiextensions-apiserver/releases) - [Commits](https://github.com/kubernetes/apiextensions-apiserver/compare/v0.19.2...v0.21.3) --- updated-dependencies: - dependency-name: k8s.io/apiextensions-apiserver dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): bump google.golang.org/api from 0.34.0 to 0.52.0 Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.34.0 to 0.52.0. - [Release notes](https://github.com/googleapis/google-api-go-client/releases) - [Changelog](https://github.com/googleapis/google-api-go-client/blob/master/CHANGES.md) - [Commits](https://github.com/googleapis/google-api-go-client/compare/v0.34.0...v0.52.0) --- updated-dependencies: - dependency-name: google.golang.org/api dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * start update dependencies * update mods and otlp * fix(build): update to go 1.16 * old version for k8s mods * update k8s versions * update orbos * with batcher * add batch span processor * try with older otel version 0.20 * remove syncer * otel rc2 * fix config Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Stefan Benz <stefan@caos.ch>	2021-08-10 07:27:27 +02:00
Fabi	6b3f5b984c	feat: metrics (#1024 ) * refactor: switch from opencensus to opentelemetry * tempo works as designed nooooot * fix: log traceids * with grafana agent * fix: http tracing * fix: cleanup files * chore: remove todo * fix: bad test * fix: ignore methods in grpc interceptors * fix: remove test log * clean up * typo * fix(config): configure tracing endpoint * fix(span): add error id to span * feat: metrics package * feat: metrics package * fix: counter * fix: metric * try metrics * fix: coutner metrics * fix: active sessin counter * fix: active sessin counter * fix: change current Sequence table * fix: change current Sequence table * fix: current sequences * fix: spooler div metrics * fix: console view * fix: merge master * fix: Last spool run on search result instead of eventtimestamp * fix: go mod * Update console/src/assets/i18n/de.json Co-authored-by: Livio Amstutz <livio.a@gmail.com> * fix: pr review * fix: map * update oidc pkg * fix: handlers * fix: value observer * fix: remove fmt * fix: handlers * fix: tests * fix: handler minimum cycle duration 1s * fix(spooler): handler channel buffer * fix interceptors Co-authored-by: adlerhurst <silvan.reusser@gmail.com> Co-authored-by: Livio Amstutz <livio.a@gmail.com>	2020-12-02 08:50:59 +01:00

23 Commits