fix: drop default otel scope info from metrics (#10306)

# Which Problems Are Solved

Currently, the prometheus endpoint metrics contain otel specific labels
that increase the overall metric size to the point that the exemplar
implementation in the underlying prom exporter library throws an error,
see https://github.com/zitadel/zitadel/issues/10047. The MaxRuneSize for
metric refs in exemplars is 128 and many of metrics cross this because
of `otel_scope_name`.

# How the Problems Are Solved

This change drops those otel specific labels on the prometheus exporter:
`otel_scope_name` and `otel_scope_version`

Current metrics example: 
```
http_server_duration_milliseconds_bucket{http_method="GET",http_status_code="200",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.53.0",le="0"} 0
http_server_duration_milliseconds_bucket{http_method="GET",http_status_code="200",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.53.0",le="5"} 100
http_server_duration_milliseconds_bucket{http_method="GET",http_status_code="200",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.53.0",le="10"} 100
...
grpc_server_grpc_status_code_total{grpc_method="/zitadel.admin.v1.AdminService/ListIAMMemberRoles",otel_scope_name="",otel_scope_version="",return_code="200"} 3
grpc_server_grpc_status_code_total{grpc_method="/zitadel.admin.v1.AdminService/ListIAMMembers",otel_scope_name="",otel_scope_version="",return_code="200"} 3
grpc_server_grpc_status_code_total{grpc_method="/zitadel.admin.v1.AdminService/ListMilestones",otel_scope_name="",otel_scope_version="",return_code="200"} 1
```

New example: 
```
http_server_duration_milliseconds_bucket{http_method="GET",http_status_code="200",le="10"} 8
http_server_duration_milliseconds_bucket{http_method="GET",http_status_code="200",le="25"} 8
http_server_duration_milliseconds_bucket{http_method="GET",http_status_code="200",le="50"} 9
http_server_duration_milliseconds_bucket{http_method="GET",http_status_code="200",le="75"} 9
...
grpc_server_grpc_status_code_total{grpc_method="/zitadel.admin.v1.AdminService/GetSupportedLanguages",return_code="200"} 1
grpc_server_grpc_status_code_total{grpc_method="/zitadel.admin.v1.AdminService/ListMilestones",return_code="200"} 1
grpc_server_grpc_status_code_total{grpc_method="/zitadel.auth.v1.AuthService/GetMyLabelPolicy",return_code="200"} 3
```

# Additional Changes

None

# Additional Context

From my understanding, this change is fully spec compliant with
Prometheus and Otel:
*
https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/#instrumentation-scope

However, these tags were originally added as optional labels to
disambiguate metrics. But I'm not sure we need to care about that right
now? My gut feeling is that exemplar support (the ability for traces to
reference metrics) would be a preferable tradeoff to this label
standard.

Co-authored-by: Silvan <27845747+adlerhurst@users.noreply.github.com>
This commit is contained in:
Zach Hirschtritt
2025-08-14 08:44:37 -04:00
committed by adlerhurst
parent 2ede875d41
commit fc8e15ad76

View File

@@ -32,7 +32,7 @@ func NewMetrics(meterName string) (metrics.Metrics, error) {
if err != nil {
return nil, err
}
exporter, err := prometheus.New()
exporter, err := prometheus.New(prometheus.WithoutScopeInfo())
if err != nil {
return &Metrics{}, err
}