* feat: enable limiting audit trail * support AddExclusiveQuery * fix invalid condition * register event mappers * fix NullDuration validity * test query side for limits * lint * acceptance test audit trail limit * fix acceptance test * translate limits not found * update tests * fix linting * add audit log retention to default instance * fix tests * update docs * remove todo * improve test name
12 KiB
title | sidebar_lable |
---|---|
ZITADEL Production Setup | Production Setup |
As soon as you successfully deployed ZITADEL as a proof of concept using one of our deployment guides, you are ready to configure ZITADEL for production usage.
TL;DR
We created a Production Checklist as a technical step by step guide.
High Availability
We recommend running ZITADEL highly available using an orchestrator that schedules ZITADEL on multiple servers,
like Kubernetes.
For keeping startup times fast when scaling ZITADEL,
you should also consider using separate jobs with zitadel init
and zitadel setup
,
so your workload containers just have to execute zitadel start
.
Read more about separating the init and setup phases on the Updating and Scaling page.
Configuration
Read on the configure page about the available options you have to configure ZITADEL. Prefer passing .yaml files to the ZITADEL binary instead of environment variables. Restricting access to these files to avoid leaking sensitive information is easier than restricting access to environment variables. Also, not all configuration options are available as environment variables.
Networking
- To make ZITADEL available at the domain of your choice, you need to configure the ExternalDomain property.
- To enable and restrict access to HTTPS, head over to the description of your TLS options.
- If you want to front ZITADEL with a reverse proxy, web application firewall or content delivery network, make sure to support HTTP/2.
- You can also refer to some example reverse proxy configurations.
- The ZITADEL Console web GUI uses many gRPC-Web stubs. This results in a fairly big JavaScript bundle. You might want to compress it using Gzip or Brotli.
- Serving and caching the assets using a content delivery network could improve network latencies and shield your ZITADEL runtime.
Monitoring
By default, metrics are exposed at /debug/metrics in OpenTelemetry (otel) format.
Also, you can enable tracing in the ZITADEL configuration.
Tracing:
# Choose one in "otel", "google", "log" and "none"
Type: google
Fraction: 1
MetricPrefix: zitadel
Logging
ZITADEL follows the principles that guide cloud-native and twelve factor applications. Logs are a stream of time-ordered events collected from all running processes.
ZITADEL processes write the following events to the standard output:
- Runtime Logs: Define the log level and record format in the Log configuration section
- Access Logs: Enable logging all HTTP and gRPC responses from the ZITADEL binary in the LogStore section
- Actions Exectution Logs: Actions can emit custom logs at different levels. For example, a log record can be emitted each time a user is created or authenticated. If you don't want to have these logs in STDOUT, you can disable this in the LogStore section .
Log file management should not be in each business apps responsibility. Instead, your execution environment should provide tooling for managing logs in a generic way. This includes tasks like rotating files, routing, collecting, archiving and cleaning-up. For example, systemd has journald and kubernetes has fluentd and fluentbit.
Telemetry
If you want to have some data about reached usage milestones pushed to external systems, enable telemetry in the ZITADEL configuration.
The following table describes the milestones that are sent to the endpoints:
Trigger | Description |
---|---|
A virtual instance is created. | This data point is also sent when the first instance is automatically created during the ZITADEL binaries setup phase in a self-hosting scenario. |
An authentication succeeded for the first time on an instance. | This is the first authentication with the instances automatically created admin user during the instance setup, which can be a human or a machine. |
A project is created for the first time in a virtual instance. | The ZITADEL project that is automatically created during the instance setup is omitted. |
An application is created for the first time in a virtual instance. | The applications in the ZITADEL project that are automatically created during the instance setup are omitted. |
An authentication succeeded for the first time in a virtal instances application. | This is the first authentication using a ZITADEL application that is not created during the instance setup phase. |
A virtual instance is deleted. | This data point is sent when a virtual instance is deleted via ZITADELs system API |
ZITADEL pushes the metrics by projecting certain events. Therefore, you can configure delivery guarantees not in the Telemetry section of the ZITADEL configuration, but in the Projections.Customizations.Telemetry section
Database
Prefer CockroachDB
ZITADEL supports CockroachDB and PostgreSQL. We highly recommend using CockroachDB, as horizontal scaling is much easier than with PostgreSQL. Also, if you are concerned about multi-regional data locality, the way to go is with CockroachDB.
The indexes for the database are optimized using load tests from ZITADEL Cloud, which runs with CockroachDB. If you identify problems with your Postgresql during load tests that indicate that the indexes are not optimized, please create an issue in our github repository.
Configure ZITADEL
Depending on your environment, you maybe would want to tweak some settings about how ZITADEL interacts with the database in the database section of your ZITADEL configuration. Read more about your database configuration options.
Database:
cockroach:
Host: localhost
Port: 26257
Database: zitadel
//highlight-start
MaxOpenConns: 20
MaxConnLifetime: 30m
MaxConnIdleTime: 30m
//highlight-end
Options: ""
You also might want to configure how projections are computed. These are the default values:
# The Projections section defines the behaviour for the scheduled and synchronous events projections.
Projections:
# Time interval between scheduled projections
RequeueEvery: 60s
# Time between retried database statements resulting from projected events
RetryFailedAfter: 1s
# Retried execution number of database statements resulting from projected events
MaxFailureCount: 5
# Number of concurrent projection routines. Values of 0 and below are overwritten to 1
ConcurrentInstances: 1
# Limit of returned events per query
BulkLimit: 200
# Only instance are projected, for which at least a projection relevant event exists withing the timeframe
# from HandleActiveInstances duration in the past until the projections current time
# Defaults to twice the RequeueEvery duration
HandleActiveInstances: 120s
# In the Customizations section, all settings from above can be overwritten for each specific projection
Customizations:
Projects:
BulkLimit: 2000
# The Notifications projection is used for sending emails and SMS to users
Notifications:
# As notification projections don't result in database statements, retries don't have an effect
MaxFailureCount: 0
# The NotificationsQuotas projection is used for calling quota webhooks
NotificationsQuotas:
# Delivery guarantee requirements are probably higher for quota webhooks
# Defaults to 45 days
HandleActiveInstances: 1080h
# As quota notification projections don't result in database statements, retries don't have an effect
MaxFailureCount: 0
# Quota notifications are not so time critical. Setting RequeueEvery every five minutes doesn't annoy the db too much.
RequeueEvery: 300s
Manage your data
When designing your backup strategy, it is worth knowing that ZITADEL is event sourced. That means, ZITADEL itself is able to recompute its whole state from the records in the table eventstore.events. The timestamp of your last record in the events table defines up to which point in time ZITADEL can restore its state.
The ZITADEL binary itself is stateless, so there is no need for a special backup job.
Generally, for maintaining your database management system in production, please refer to the corresponding docs for CockroachDB or for PostgreSQL.
Data initialization
- You can configure instance defaults in the DefaultInstance section. If you plan to eventually create multiple virtual instances, these defaults take effect. Also, these configurations apply to the first instance, that ZITADEL automatically creates for you. Especially the following properties are of special interest for your production setup.
DefaultInstance:
OIDCSettings:
AccessTokenLifetime: 12h
IdTokenLifetime: 12h
RefreshTokenIdleExpiration: 720h #30d
RefreshTokenExpiration: 2160h #90d
# this configuration sets the default email configuration
SMTPConfiguration:
# configuration of the host
SMTP:
#for example smtp.mailtrap.io:2525
Host:
User:
Password:
TLS:
# if the host of the sender is different from ExternalDomain set DefaultInstance.DomainPolicy.SMTPSenderAddressMatchesInstanceDomain to false
From:
FromName:
ReplyToAddress:
- If you don't want to use the DefaultInstance configuration for the first instance that ZITADEL automatically creates for you during the setup phase, you can provide a FirstInstance YAML section using the --steps argument.
- Learn how to configure ZITADEL via the Console user interface.
- Probably, you also want to apply your custom branding, hook into certain events, customize texts or add metadata to your users.
- If you want to automatically create ZITADEL resources, you can use the ZITADEL Terraform Provider.
Limits and Quotas
If you host ZITADEL as a service, you might want to limit usage and/or execute tasks on certain usage units and levels.
Minimum system requirements
General resource usage
ZITADEL consumes around 512MB RAM and can run with less than 1 CPU core. The database consumes around 2 CPU under normal conditions and 6GB RAM with some caching to it.
:::info Password hashing Be aware of CPU spikes when hashing passwords. We recommend to have 4 CPU cores available for this purpose. :::
Production HA cluster
It is recommended to build a minimal high-availability with 3 Nodes with 4 CPU and 16GB memory each. Excluding non-essential services, such as log collection, metrics etc, the resources could be reduced to around 4 CPU and 8GB memory each.