7 Commits

Author SHA1 Message Date
Zach Hirschtritt
61ddecee31
fix: add prometheus metrics on projection handlers (#9561)
# Which Problems Are Solved

With current provided telemetry it's difficult to predict when a
projection handler is under increased load until it's too late and
causes downstream issues. Importantly, projection updating is in the
critical path for many login flows and increased latency there can
result in system downtime for users.

# How the Problems Are Solved

This PR adds three new prometheus-style metrics:
1. **projection_events_processed** (_labels: projection, success_) -
This metric gives us a counter of the number of events processed per
projection update run and whether they we're processed without error. A
high number of events being processed can let us know how busy a
particular projection handler is.

2. **projection_handle_timer** _(labels: projection)_ - This is the time
it takes to process a projection update given a batch of events - time
to take the current_states lock, query for new events, reduce,
update_the projection, and update current_states.

3. **projection_state_latency** _(labels: projection)_ - This is the
time from the last event processed in the current_states table for a
given projection. It tells us how old was the last event you processed?
Or, how far behind are you running for this projection? Higher latencies
could mean high load or stalled projection handling.

# Additional Changes

I also had to initialize the global otel metrics provider (`metrics.M`)
in the `setup` step additionally to `start` since projection handlers
are initialized at setup. The initialization checks if a metrics
provider is already set (in case of `start-from-setup` or
`start-from-init` to prevent overwriting, which causes the otel metrics
provider to stop working.

# Additional Context

## Example Dashboards

![image](https://github.com/user-attachments/assets/94ba5c2b-9c62-44cd-83ee-4db4a8859073)

![image](https://github.com/user-attachments/assets/60a1b406-a8c6-48dc-a925-575359f97e1e)

---------

Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
Co-authored-by: Livio Spring <livio.a@gmail.com>
(cherry picked from commit c1535b7b4916abd415a29e034d266c4ea1dc3645)
2025-03-28 07:58:47 +01:00
Tim Möhlmann
4f7254f121 fix(deps): upgrade oidc and otel (#6468) 2023-09-01 15:52:53 +03:00
Silvan
698f46fe6a
chore: update dependencies (#5401)
* chore(backend): update dependencies

* chore(pipeline): update golangci-lint
2023-04-06 06:29:55 +00:00
Livio Spring
9b6dad18cb
feat: provide metrics endpoint (#3902)
* feat: provide metrics endpoint

* config

* enable otel metrics by default

Co-authored-by: Florian Forster <florian@caos.ch>
2022-07-18 10:42:32 +02:00
Silvan
30153cff39
chore(gomod): update otel to 1.0.0 (#2414) 2021-09-23 12:50:17 +02:00
Silvan
8609ced24b
fix(build): update go version to 1.16 and dependencies (#2136)
* chore(deps): bump k8s.io/apiextensions-apiserver from 0.19.2 to 0.21.3

Bumps [k8s.io/apiextensions-apiserver](https://github.com/kubernetes/apiextensions-apiserver) from 0.19.2 to 0.21.3.
- [Release notes](https://github.com/kubernetes/apiextensions-apiserver/releases)
- [Commits](https://github.com/kubernetes/apiextensions-apiserver/compare/v0.19.2...v0.21.3)

---
updated-dependencies:
- dependency-name: k8s.io/apiextensions-apiserver
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump google.golang.org/api from 0.34.0 to 0.52.0

Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.34.0 to 0.52.0.
- [Release notes](https://github.com/googleapis/google-api-go-client/releases)
- [Changelog](https://github.com/googleapis/google-api-go-client/blob/master/CHANGES.md)
- [Commits](https://github.com/googleapis/google-api-go-client/compare/v0.34.0...v0.52.0)

---
updated-dependencies:
- dependency-name: google.golang.org/api
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* start update dependencies

* update mods and otlp

* fix(build): update to go 1.16

* old version for k8s mods

* update k8s versions

* update orbos

* with batcher

* add batch span processor

* try with older otel version 0.20

* remove syncer

* otel rc2

* fix config

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Stefan Benz <stefan@caos.ch>
2021-08-10 07:27:27 +02:00
Fabi
6b3f5b984c
feat: metrics (#1024)
* refactor: switch from opencensus to opentelemetry

* tempo works as designed nooooot

* fix: log traceids

* with grafana agent

* fix: http tracing

* fix: cleanup files

* chore: remove todo

* fix: bad test

* fix: ignore methods in grpc interceptors

* fix: remove test log

* clean up

* typo

* fix(config): configure tracing endpoint

* fix(span): add error id to span

* feat: metrics package

* feat: metrics package

* fix: counter

* fix: metric

* try metrics

* fix: coutner metrics

* fix: active sessin counter

* fix: active sessin counter

* fix: change current Sequence table

* fix: change current Sequence table

* fix: current sequences

* fix: spooler div metrics

* fix: console view

* fix: merge master

* fix: Last spool run on search result instead of eventtimestamp

* fix: go mod

* Update console/src/assets/i18n/de.json

Co-authored-by: Livio Amstutz <livio.a@gmail.com>

* fix: pr review

* fix: map

* update oidc pkg

* fix: handlers

* fix: value observer

* fix: remove fmt

* fix: handlers

* fix: tests

* fix: handler minimum cycle duration 1s

* fix(spooler): handler channel buffer

* fix interceptors

Co-authored-by: adlerhurst <silvan.reusser@gmail.com>
Co-authored-by: Livio Amstutz <livio.a@gmail.com>
2020-12-02 08:50:59 +01:00