Commit Graph

515 Commits

Author SHA1 Message Date
adlerhurst
8f5bdff131 refactor: use 1 db pool instead of 3 2024-12-22 11:40:55 +01:00
Stefan Benz
c3b97a91a2 feat: add saml request to link to sessions (#9001)
# Which Problems Are Solved

It is currently not possible to use SAML with the Session API.

# How the Problems Are Solved

Add SAML service, to get and resolve SAML requests.
Add SAML session and SAML request aggregate, which can be linked to the
Session to get back a SAMLResponse from the API directly.

# Additional Changes

Update of dependency zitadel/saml to provide all functionality for
handling of SAML requests and responses.

# Additional Context

Closes #6053

---------

Co-authored-by: Livio Spring <livio.a@gmail.com>
2024-12-19 11:11:40 +00:00
Livio Spring
50d2b26a28 feat: specify login UI version on instance and apps (#9071)
# Which Problems Are Solved

To be able to migrate or test the new login UI, admins might want to
(temporarily) switch individual apps.
At a later point admin might want to make sure all applications use the
new login UI.

# How the Problems Are Solved

- Added a feature flag `` on instance level to require all apps to use
the new login and provide an optional base url.
- if the flag is enabled, all (OIDC) applications will automatically use
the v2 login.
  - if disabled, applications can decide based on their configuration
- Added an option on OIDC apps to use the new login UI and an optional
base url.
- Removed the requirement to use `x-zitadel-login-client` to be
redirected to the login V2 and retrieve created authrequest and link
them to SSO sessions.
- Added a new "IAM_LOGIN_CLIENT" role to allow management of users,
sessions, grants and more without `x-zitadel-login-client`.

# Additional Changes

None

# Additional Context

closes https://github.com/zitadel/zitadel/issues/8702
2024-12-19 10:37:46 +01:00
Tim Möhlmann
da706a8b30 fix(setup): make step 39 repeatable (#9085)
# Which Problems Are Solved

When downgrading zitadel and upgrading it again, it might be that orgs
deleted in this period still have stale entries in the fields table.

# How the Problems Are Solved

- Make the cleanup repeatable
- Scope the query by instance so that an index is used.
2024-12-18 16:48:22 +01:00
Silvan
b89e8a6037 fix(setup): make step 41 repeatable (#9084)
# Which Problems Are Solved

setup step 41 cannot handle downgrades at the moment. This step writes
the instance domain to the fields table. If there are new instances
created during the downgraded version is running there would be domain
missing in the fields afterwards.

# How the Problems Are Solved

Make step 41 repeatable for each version
2024-12-18 15:28:29 +00:00
Tim Möhlmann
6f6e2234eb fix(migrations): clean stale org fields using events (#9051)
# Which Problems Are Solved

Migration step 39 is supposed to cleanup stale organization entries in
the eventstore.fields table. In order to do this it used the projection
to check which orgs still exist.

During initial setup of ZITADEL the first instance with the organization
is created. Howevet, the projections are filled after all migrations are
done. With the organization projection empty, the fields of the first
org would be deleted.

This was discovered during development of a new field type. The
accosiated events did not yet have any projection based filled assigned.
It seems fields with a pre-fill projection are somehow restored.
Therefore a restoration migration isn't required IMO.

# How the Problems Are Solved

Query the event store for `org.removed` events instead. This has the
drawback of using a sequential scan on the eventstore, making the
migration more expensive.

# Additional Changes

- none

# Additional Context

- Introduced in https://github.com/zitadel/zitadel/pull/8946
2024-12-12 18:37:18 +02:00
Silvan
77cd430b3a refactor(handler): cache active instances (#9008)
# Which Problems Are Solved

Scheduled handlers use `eventstore.InstanceIDs` to get the all active
instances within a given timeframe. This function scrapes through all
events written within that time frame which can cause heavy load on the
database.

# How the Problems Are Solved

A new query cache `activeInstances` is introduced which caches the ids
of all instances queried by id or host within the configured timeframe.

# Additional Changes

- Changed `default.yaml`
  - Removed `HandleActiveInstances` from custom handler configs
- Added `MaxActiveInstances` to define the maximal amount of cached
instance ids
- fixed start-from-init and start-from-setup to start auth and admin
projections twice
- fixed org cache invalidation to use correct index

# Additional Context

- part of #8999
2024-12-06 11:32:53 +00:00
Livio Spring
7a3ae8f499 fix(notifications): bring back legacy notification handling (#9015)
# Which Problems Are Solved

There are some problems related to the use of CockroachDB with the new
notification handling (#8931).
See #9002 for details.

# How the Problems Are Solved

- Brought back the previous notification handler as legacy mode.
- Added a configuration to choose between legacy mode and new parallel
workers.
  - Enabled legacy mode by default to prevent issues.

# Additional Changes

None

# Additional Context

- closes https://github.com/zitadel/zitadel/issues/9002
- relates to #8931
2024-12-06 10:56:19 +01:00
Livio Spring
7f0378636b fix(notifications): improve error handling (#8994)
# Which Problems Are Solved

While running the latest RC / main, we noticed some errors including
context timeouts and rollback issues.

# How the Problems Are Solved

- The transaction context is passed and used for any event being written
and for handling savepoints to be able to handle context timeouts.
- The user projection is not triggered anymore. This will reduce
unnecessary load and potential timeouts if lot of workers are running.
In case a user would not be projected yet, the request event will log an
error and then be skipped / retried on the next run.
- Additionally, the context is checked if being closed after each event
process.
- `latestRetries` now correctly only returns the latest retry events to
be processed
- Default values for notifications have been changed to run workers less
often, more retry delay, but less transaction duration.

# Additional Changes

None

# Additional Context

relates to #8931

---------

Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
2024-12-04 20:17:49 +00:00
Silvan
6614aacf78 feat(fields): add instance domain (#9000)
# Which Problems Are Solved

Instance domains are only computed on read side. This can cause missing
domains if calls are executed shortly after a instance domain (or
instance) was added.

# How the Problems Are Solved

The instance domain is added to the fields table which is filled on
command side.

# Additional Changes

- added setup step to compute instance domains
- instance by host uses fields table instead of instance_domains table

# Additional Context

- part of https://github.com/zitadel/zitadel/issues/8999
2024-12-04 18:10:10 +00:00
Silvan
dab5d9e756 refactor(eventstore): move push logic to sql (#8816)
# Which Problems Are Solved

If many events are written to the same aggregate id it can happen that
zitadel [starts to retry the push
transaction](48ffc902cc/internal/eventstore/eventstore.go (L101))
because [the locking
behaviour](48ffc902cc/internal/eventstore/v3/sequence.go (L25))
during push does compute the wrong sequence because newly committed
events are not visible to the transaction. These events impact the
current sequence.

In cases with high command traffic on a single aggregate id this can
have severe impact on general performance of zitadel. Because many
connections of the `eventstore pusher` database pool are blocked by each
other.

# How the Problems Are Solved

To improve the performance this locking mechanism was removed and the
business logic of push is moved to sql functions which reduce network
traffic and can be analyzed by the database before the actual push. For
clients of the eventstore framework nothing changed.

# Additional Changes

- after a connection is established prefetches the newly added database
types
- `eventstore.BaseEvent` now returns the correct revision of the event

# Additional Context

- part of https://github.com/zitadel/zitadel/issues/8931

---------

Co-authored-by: Tim Möhlmann <tim+github@zitadel.com>
Co-authored-by: Livio Spring <livio.a@gmail.com>
Co-authored-by: Max Peintner <max@caos.ch>
Co-authored-by: Elio Bischof <elio@zitadel.com>
Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
Co-authored-by: Miguel Cabrerizo <30386061+doncicuto@users.noreply.github.com>
Co-authored-by: Joakim Lodén <Loddan@users.noreply.github.com>
Co-authored-by: Yxnt <Yxnt@users.noreply.github.com>
Co-authored-by: Stefan Benz <stefan@caos.ch>
Co-authored-by: Harsha Reddy <harsha.reddy@klaviyo.com>
Co-authored-by: Zach H <zhirschtritt@gmail.com>
2024-12-04 13:51:40 +00:00
Stefan Benz
7caa43ab23 feat: action v2 signing (#8779)
# Which Problems Are Solved

The action v2 messages were didn't contain anything providing security
for the sent content.

# How the Problems Are Solved

Each Target now has a SigningKey, which can also be newly generated
through the API and returned at creation and through the Get-Endpoints.
There is now a HTTP header "Zitadel-Signature", which is generated with
the SigningKey and Payload, and also contains a timestamp to check with
a tolerance if the message took to long to sent.

# Additional Changes

The functionality to create and check the signature is provided in the
pkg/actions package, and can be reused in the SDK.

# Additional Context

Closes #7924

---------

Co-authored-by: Livio Spring <livio.a@gmail.com>
2024-11-28 10:06:52 +00:00
Livio Spring
8537805ea5 feat(notification): use event worker pool (#8962)
# Which Problems Are Solved

The current handling of notification follows the same pattern as all
other projections:
Created events are handled sequentially (based on "position") by a
handler. During the process, a lot of information is aggregated (user,
texts, templates, ...).
This leads to back pressure on the projection since the handling of
events might take longer than the time before a new event (to be
handled) is created.

# How the Problems Are Solved

- The current user notification handler creates separate notification
events based on the user / session events.
- These events contain all the present and required information
including the userID.
- These notification events get processed by notification workers, which
gather the necessary information (recipient address, texts, templates)
to send out these notifications.
- If a notification fails, a retry event is created based on the current
notification request including the current state of the user (this
prevents race conditions, where a user is changed in the meantime and
the notification already gets the new state).
- The retry event will be handled after a backoff delay. This delay
increases with every attempt.
- If the configured amount of attempts is reached or the message expired
(based on config), a cancel event is created, letting the workers know,
the notification must no longer be handled.
- In case of successful send, a sent event is created for the
notification aggregate and the existing "sent" events for the user /
session object is stored.
- The following is added to the defaults.yaml to allow configuration of
the notification workers:
```yaml

Notifications:
  # The amount of workers processing the notification request events.
  # If set to 0, no notification request events will be handled. This can be useful when running in
  # multi binary / pod setup and allowing only certain executables to process the events.
  Workers: 1 # ZITADEL_NOTIFIACATIONS_WORKERS
  # The amount of events a single worker will process in a run.
  BulkLimit: 10 # ZITADEL_NOTIFIACATIONS_BULKLIMIT
  # Time interval between scheduled notifications for request events
  RequeueEvery: 2s # ZITADEL_NOTIFIACATIONS_REQUEUEEVERY
  # The amount of workers processing the notification retry events.
  # If set to 0, no notification retry events will be handled. This can be useful when running in
  # multi binary / pod setup and allowing only certain executables to process the events.
  RetryWorkers: 1 # ZITADEL_NOTIFIACATIONS_RETRYWORKERS
  # Time interval between scheduled notifications for retry events
  RetryRequeueEvery: 2s # ZITADEL_NOTIFIACATIONS_RETRYREQUEUEEVERY
  # Only instances are projected, for which at least a projection-relevant event exists within the timeframe
  # from HandleActiveInstances duration in the past until the projection's current time
  # If set to 0 (default), every instance is always considered active
  HandleActiveInstances: 0s # ZITADEL_NOTIFIACATIONS_HANDLEACTIVEINSTANCES
  # The maximum duration a transaction remains open
  # before it spots left folding additional events
  # and updates the table.
  TransactionDuration: 1m # ZITADEL_NOTIFIACATIONS_TRANSACTIONDURATION
  # Automatically cancel the notification after the amount of failed attempts
  MaxAttempts: 3 # ZITADEL_NOTIFIACATIONS_MAXATTEMPTS
  # Automatically cancel the notification if it cannot be handled within a specific time
  MaxTtl: 5m  # ZITADEL_NOTIFIACATIONS_MAXTTL
  # Failed attempts are retried after a confogired delay (with exponential backoff).
  # Set a minimum and maximum delay and a factor for the backoff
  MinRetryDelay: 1s  # ZITADEL_NOTIFIACATIONS_MINRETRYDELAY
  MaxRetryDelay: 20s # ZITADEL_NOTIFIACATIONS_MAXRETRYDELAY
  # Any factor below 1 will be set to 1
  RetryDelayFactor: 1.5 # ZITADEL_NOTIFIACATIONS_RETRYDELAYFACTOR
```


# Additional Changes

None

# Additional Context

- closes #8931
2024-11-27 15:01:17 +00:00
Tim Möhlmann
ccef67cefa fix(eventstore): cleanup org fields on remove (#8946)
# Which Problems Are Solved

When an org is removed, the corresponding fields are not deleted. This
creates issues, such as recreating a new org with the same verified
domain.

# How the Problems Are Solved

Remove the search fields by the org aggregate, instead of just setting
the removed state.

# Additional Changes

- Cleanup migration script that removed current stale fields.

# Additional Context

- Closes https://github.com/zitadel/zitadel/issues/8943
- Related to https://github.com/zitadel/zitadel/pull/8790

---------

Co-authored-by: Silvan <silvan.reusser@gmail.com>
2024-11-26 15:26:41 +00:00
Dominic Bachmann
48ffc902cc fix: typo in defaults.yaml where ExternalPort should be ExternalDomain (#8923)
# Which Problems Are Solved

Fixed a typo in cmd/defaults.yaml where ExternalPort should be
ExternalDomain

Co-authored-by: Fabi <fabienne@zitadel.com>
2024-11-22 10:25:25 +01:00
Tim Möhlmann
c165ed07f4 feat(cache): organization (#8903)
# Which Problems Are Solved

Organizations are ofter searched for by ID or primary domain. This
results in many redundant queries, resulting in a performance impact.

# How the Problems Are Solved

Cache Organizaion objects by ID and primary domain.

# Additional Changes

- Adjust integration test config to use all types of cache.
- Adjust integration test lifetimes so the pruner has something to do
while the tests run.

# Additional Context

- Closes #8865
- After #8902
2024-11-21 08:05:03 +02:00
Tim Möhlmann
b77901cb4b fix(cache): unset client and user names in defaults (#8901)
# Which Problems Are Solved

By having default entries in the `Username` and `ClientName` fields, it
was not possible to unset there parameters. Unsetting them is required
for GCP connections

# How the Problems Are Solved

Set the fields to empty strings.

# Additional Changes

- none

# Additional Context

- none
2024-11-13 21:18:47 +00:00
Tim Möhlmann
3b7b0c69e6 feat(cache): redis circuit breaker (#8890)
# Which Problems Are Solved

If a redis cache has connection issues or any other type of permament
error,
it tanks the responsiveness of ZITADEL.
We currently do not support things like Redis cluster or sentinel. So
adding a simple redis cache improves performance but introduces a single
point of failure.

# How the Problems Are Solved

Implement a [circuit
breaker](https://learn.microsoft.com/en-us/previous-versions/msp-n-p/dn589784(v=pandp.10)?redirectedfrom=MSDN)
as
[`redis.Limiter`](https://pkg.go.dev/github.com/redis/go-redis/v9#Limiter)
by wrapping sony's [gobreaker](https://github.com/sony/gobreaker)
package. This package is picked as it seems well maintained and we
already use their `sonyflake` package

# Additional Changes

- The unit tests constructed an unused `redis.Client` and didn't cleanup
the connector. This is now fixed.

# Additional Context

Closes #8864
2024-11-13 19:11:48 +01:00
chris-1o
a09c772b03 fix(mirror): Fix instance_id check for tables without (#8852)
# Which Problems Are Solved

Fixes 'column "instance_id" does not exist' errors from #8558.

# How the Problems Are Solved

The instanceClause / WHERE clause in the query for the respective tables
is excluded.

I have successfully created a mirror with this change.
2024-11-12 16:03:41 +00:00
Jonathon Taylor
04a166f2d2 fix(translations): typo in VerifyEmail body (#8872)
# Which Problems Are Solved

Fixes small typo in email body during user creation & verification. The
change also includes the removal of some unnecessary white space in the
same yaml file.

# How the Problems Are Solved

Replaces din't with didn't. 

![image](https://github.com/user-attachments/assets/48abf38b-4deb-42b7-a85b-91009e19f27f)

Co-authored-by: jtaylor@dingo.com <jtaylor@dingo.com>
Co-authored-by: Silvan <silvan.reusser@gmail.com>
2024-11-11 12:03:15 +00:00
Livio Spring
fb6579e456 fix(milestones): use previous spelling for milestone types (#8886)
# Which Problems Are Solved

https://github.com/zitadel/zitadel/pull/8788 accidentally changed the
spelling of milestone types from PascalCase to snake_case. This breaks
systems where `milestone.pushed` events already exist.

# How the Problems Are Solved

- Use PascalCase again
- Prefix event types with v2. (Previous pushed event type was anyway
ignored).
- Create `milstones3` projection

# Additional Changes

None

# Additional Context

relates to #8788
2024-11-11 11:28:27 +00:00
Tim Möhlmann
250f2344c8 feat(cache): redis cache (#8822)
# Which Problems Are Solved

Add a cache implementation using Redis single mode. This does not add
support for Redis Cluster or sentinel.

# How the Problems Are Solved

Added the `internal/cache/redis` package. All operations occur
atomically, including setting of secondary indexes, using LUA scripts
where needed.

The [`miniredis`](https://github.com/alicebob/miniredis) package is used
to run unit tests.

# Additional Changes

- Move connector code to `internal/cache/connector/...` and remove
duplicate code from `query` and `command` packages.
- Fix a missed invalidation on the restrictions projection

# Additional Context

Closes #8130
2024-11-04 10:44:51 +00:00
Livio Spring
041af26917 feat(OIDC): add back channel logout (#8837)
# Which Problems Are Solved

Currently ZITADEL supports RP-initiated logout for clients. Back-channel
logout ensures that user sessions are terminated across all connected
applications, even if the user closes their browser or loses
connectivity providing a more secure alternative for certain use cases.

# How the Problems Are Solved

If the feature is activated and the client used for the authentication
has a back_channel_logout_uri configured, a
`session_logout.back_channel` will be registered. Once a user terminates
their session, a (notification) handler will send a SET (form POST) to
the registered uri containing a logout_token (with the user's ID and
session ID).

- A new feature "back_channel_logout" is added on system and instance
level
- A `back_channel_logout_uri` can be managed on OIDC applications
- Added a `session_logout` aggregate to register and inform about sent
`back_channel` notifications
- Added a `SecurityEventToken` channel and `Form`message type in the
notification handlers
- Added `TriggeredAtOrigin` fields to `HumanSignedOut` and
`TerminateSession` events for notification handling
- Exported various functions and types in the `oidc` package to be able
to reuse for token signing in the back_channel notifier.
- To prevent that current existing session termination events will be
handled, a setup step is added to set the `current_states` for the
`projections.notifications_back_channel_logout` to the current position

- [x] requires https://github.com/zitadel/oidc/pull/671

# Additional Changes

- Updated all OTEL dependencies to v1.29.0, since OIDC already updated
some of them to that version.
- Single Session Termination feature is correctly checked (fixed feature
mapping)

# Additional Context

- closes https://github.com/zitadel/zitadel/issues/8467
- TODO:
  - Documentation
  - UI to be done: https://github.com/zitadel/zitadel/issues/8469

---------

Co-authored-by: Hidde Wieringa <hidde@hiddewieringa.nl>
2024-10-31 15:57:17 +01:00
Tim Möhlmann
32bad3feb3 perf(milestones): refactor (#8788)
Some checks are pending
ZITADEL CI/CD / core (push) Waiting to run
ZITADEL CI/CD / console (push) Waiting to run
ZITADEL CI/CD / version (push) Waiting to run
ZITADEL CI/CD / compile (push) Blocked by required conditions
ZITADEL CI/CD / core-unit-test (push) Blocked by required conditions
ZITADEL CI/CD / core-integration-test (push) Blocked by required conditions
ZITADEL CI/CD / lint (push) Blocked by required conditions
ZITADEL CI/CD / container (push) Blocked by required conditions
ZITADEL CI/CD / e2e (push) Blocked by required conditions
ZITADEL CI/CD / release (push) Blocked by required conditions
Code Scanning / CodeQL-Build (go) (push) Waiting to run
Code Scanning / CodeQL-Build (javascript) (push) Waiting to run
# Which Problems Are Solved

Milestones used existing events from a number of aggregates. OIDC
session is one of them. We noticed in load-tests that the reduction of
the oidc_session.added event into the milestone projection is a costly
business with payload based conditionals. A milestone is reached once,
but even then we remain subscribed to the OIDC events. This requires the
projections.current_states to be updated continuously.


# How the Problems Are Solved

The milestone creation is refactored to use dedicated events instead.
The command side decides when a milestone is reached and creates the
reached event once for each milestone when required.

# Additional Changes

In order to prevent reached milestones being created twice, a migration
script is provided. When the old `projections.milestones` table exist,
the state is read from there and `v2` milestone aggregate events are
created, with the original reached and pushed dates.

# Additional Context

- Closes https://github.com/zitadel/zitadel/issues/8800
2024-10-28 08:29:34 +00:00
Livio Spring
79fb4cc1cc fix: correctly check denied domains and ips for actions (#8810)
# Which Problems Are Solved

System administrators can block hosts and IPs for HTTP calls in actions.
Using DNS, blocked IPs could be bypassed.

# How the Problems Are Solved

- Hosts are resolved (DNS lookup) to check whether their corresponding
IP is blocked.

# Additional Changes

- Added complete lookup ip address range and "unspecified" address to
the default `DenyList`
2024-10-22 16:16:44 +02:00
Shubham Singh Sugara
7eb54e4c7b fix: Update Defaults.yaml (#8731)
# Which Problems Are Solved
The primary issue addressed in this PR is that the defaults.yaml file
contains escaped characters (like `&lt;` for < and `&gt;` for >) in
message texts, which prevents valid HTML rendering in certain parts of
the Zitadel platform.
These escaped characters are used in user-facing content (e.g., email
templates or notifications), resulting in improperly displayed text,
where the HTML elements like line breaks or bold text don't render
correctly.

# How the Problems Are Solved
The solution involves replacing the escaped characters with their
corresponding HTML tags in the defaults.yaml file, ensuring that the
HTML renders correctly in the emails or user interfaces where these
messages are displayed.
This update ensures that:
- The HTML in these message templates is rendered properly, improving
the user experience.
- The content looks professional and adheres to web standards for
displaying HTML content.
    
# Additional Changes
N/A

# Additional Context
N/A

- Closes #8531

Co-authored-by: Max Peintner <max@caos.ch>
2024-10-14 07:42:08 +00:00
Tim Möhlmann
a84b259e8c perf(oidc): nest position clause for session terminated query (#8738)
# Which Problems Are Solved

Optimize the query that checks for terminated sessions in the access
token verifier. The verifier is used in auth middleware, userinfo and
introspection.


# How the Problems Are Solved

The previous implementation built a query for certain events and then
appended a single `PositionAfter` clause. This caused the postgreSQL
planner to use indexes only for the instance ID, aggregate IDs,
aggregate types and event types. Followed by an expensive sequential
scan for the position. This resulting in internal over-fetching of rows
before the final filter was applied.


![Screenshot_20241007_105803](https://github.com/user-attachments/assets/f2d91976-be87-428b-b604-a211399b821c)

Furthermore, the query was searching for events which are not always
applicable. For example, there was always a session ID search and if
there was a user ID, we would also search for a browser fingerprint in
event payload (expensive). Even if those argument string would be empty.

This PR changes:

1. Nest the position query, so that a full `instance_id, aggregate_id,
aggregate_type, event_type, "position"` index can be matched.
2. Redefine the `es_wm` index to include the `position` column.
3. Only search for events for the IDs that actually have a value. Do not
search (noop) if none of session ID, user ID or fingerpint ID are set.

New query plan:


![Screenshot_20241007_110648](https://github.com/user-attachments/assets/c3234c33-1b76-4b33-a4a9-796f69f3d775)


# Additional Changes

- cleanup how we load multi-statement migrations and make that a bit
more reusable.

# Additional Context

- Related to https://github.com/zitadel/zitadel/issues/7639
2024-10-07 12:49:55 +00:00
Tim Möhlmann
25dc7bfe72 perf(cache): pgx pool connector (#8703)
# Which Problems Are Solved

Cache implementation using a PGX connection pool.

# How the Problems Are Solved

Defines a new schema `cache` in the zitadel database.
A table for string keys and a table for objects is defined.
For postgreSQL, tables are unlogged and partitioned by cache name for
performance.

Cockroach does not have unlogged tables and partitioning is an
enterprise feature that uses alternative syntax combined with sharding.
Regular tables are used here.

# Additional Changes

- `postgres.Config` can return a pxg pool. See following discussion

# Additional Context

- Part of https://github.com/zitadel/zitadel/issues/8648
- Closes https://github.com/zitadel/zitadel/issues/8647

---------

Co-authored-by: Silvan <silvan.reusser@gmail.com>
2024-10-04 13:15:41 +00:00
Livio Spring
14e2aba1bc feat: Add Twilio Verification Service (#8678)
# Which Problems Are Solved
Twilio supports a robust, multi-channel verification service that
notably supports multi-region SMS sender numbers required for our use
case. Currently, Zitadel does much of the work of the Twilio Verify (eg.
localization, code generation, messaging) but doesn't support the pool
of sender numbers that Twilio Verify does.

# How the Problems Are Solved
To support this API, we need to be able to store the Twilio Service ID
and send that in a verification request where appropriate: phone number
verification and SMS 2FA code paths.

This PR does the following: 
- Adds the ability to use Twilio Verify of standard messaging through
Twilio
- Adds support for international numbers and more reliable verification
messages sent from multiple numbers
- Adds a new Twilio configuration option to support Twilio Verify in the
admin console
- Sends verification SMS messages through Twilio Verify
- Implements Twilio Verification Checks for codes generated through the
same

# Additional Changes

# Additional Context
- base was implemented by @zhirschtritt in
https://github.com/zitadel/zitadel/pull/8268 ❤️
- closes https://github.com/zitadel/zitadel/issues/8581

---------

Co-authored-by: Zachary Hirschtritt <zachary.hirschtritt@klaviyo.com>
Co-authored-by: Joey Biscoglia <joey.biscoglia@klaviyo.com>
2024-09-26 09:14:33 +02:00
Tim Möhlmann
4eaa3163b6 feat(storage): generic cache interface (#8628)
# Which Problems Are Solved

We identified the need of caching.
Currently we have a number of places where we use different ways of
caching, like go maps or LRU.
We might also want shared chaches in the future, like Redis-based or in
special SQL tables.

# How the Problems Are Solved

Define a generic Cache interface which allows different implementations.

- A noop implementation is provided and enabled as.
- An implementation using go maps is provided
  - disabled in defaults.yaml
  - enabled in integration tests
- Authz middleware instance objects are cached using the interface.

# Additional Changes

- Enabled integration test command raceflag
- Fix a race condition in the limits integration test client
- Fix a number of flaky integration tests. (Because zitadel is super
fast now!) 🎸 🚀

# Additional Context

Related to https://github.com/zitadel/zitadel/issues/8648
2024-09-25 21:40:21 +02:00
Stefan Benz
62cdec222e feat: user v3 contact email and phone (#8644)
# Which Problems Are Solved

Endpoints to maintain email and phone contact on user v3 are not
implemented.

# How the Problems Are Solved

Add 3 endpoints with SetContactEmail, VerifyContactEmail and
ResendContactEmailCode.
Add 3 endpoints with SetContactPhone, VerifyContactPhone and
ResendContactPhoneCode.
Refactor the logic how contact is managed in the user creation and
update.

# Additional Changes

None

# Additional Context

- part of https://github.com/zitadel/zitadel/issues/6433

---------

Co-authored-by: Livio Spring <livio.a@gmail.com>
2024-09-25 13:31:31 +00:00
Tim Möhlmann
aeb379e7de fix(eventstore): revert precise decimal (#8527) (#8679) 2024-09-24 18:43:29 +02:00
Tim Möhlmann
77aa02a521 fix(projection): increase transaction duration (#8632)
# Which Problems Are Solved

Reduce the chance for projection dead-locks. Increasing or disabling the
projection transaction duration solved dead-locks in all reported cases.

# How the Problems Are Solved

Increase the default transaction duration to 1 minute.
Due to the high value it is functionally similar to disabling,
however it still provides a safety net for transaction that do freeze,
perhaps due to connection issues with the database.


# Additional Changes

- Integration test uses default.
- Technical advisory

# Additional Context

- Related to https://github.com/zitadel/zitadel/issues/8517

---------

Co-authored-by: Silvan <silvan.reusser@gmail.com>
2024-09-17 10:08:13 +00:00
Livio Spring
a07b2f4677 feat: invite user link (#8578)
# Which Problems Are Solved

As an administrator I want to be able to invite users to my application
with the API V2, some user data I will already prefil, the user should
add the authentication method themself (password, passkey, sso).

# How the Problems Are Solved

- A user can now be created with a email explicitly set to false.
- If a user has no verified email and no authentication method, an
`InviteCode` can be created through the User V2 API.
  - the code can be returned or sent through email
- additionally `URLTemplate` and an `ApplicatioName` can provided for
the email
- The code can be resent and verified through the User V2 API
- The V1 login allows users to verify and resend the code and set a
password (analog user initialization)
- The message text for the user invitation can be customized

# Additional Changes

- `verifyUserPasskeyCode` directly uses `crypto.VerifyCode` (instead of
`verifyEncryptedCode`)
- `verifyEncryptedCode` is removed (unnecessarily queried for the code
generator)

# Additional Context

- closes #8310
- TODO: login V2 will have to implement invite flow:
https://github.com/zitadel/typescript/issues/166
2024-09-11 10:53:55 +00:00
Tim Möhlmann
3aba942162 feat: add debug events API (#8533)
# Which Problems Are Solved

Add a debug API which allows pushing a set of events to be reduced in a
dedicated projection.
The events can carry a sleep duration which simulates a slow query
during projection handling.

# How the Problems Are Solved

- `CreateDebugEvents` allows pushing multiple events which simulate the
lifecycle of a resource. Each event has a `projectionSleep` field, which
issues a `pg_sleep()` statement query in the projection handler :
  - Add
  - Change
  - Remove
- `ListDebugEventsStates` list the current state of the projection,
optionally with a Trigger
- `GetDebugEventsStateByID` get the current state of the aggregate ID in
the projection, optionally with a Trigger


# Additional Changes

- none

# Additional Context

-  Allows reproduction of https://github.com/zitadel/zitadel/issues/8517
2024-09-11 08:24:00 +00:00
Tim Möhlmann
d2e0ac07f1 chore(tests): use a coverage server binary (#8407)
# Which Problems Are Solved

Use a single server instance for API integration tests. This optimizes
the time taken for the integration test pipeline,
because it allows running tests on multiple packages in parallel. Also,
it saves time by not start and stopping a zitadel server for every
package.

# How the Problems Are Solved

- Build a binary with `go build -race -cover ....`
- Integration tests only construct clients. The server remains running
in the background.
- The integration package and tested packages now fully utilize the API.
No more direct database access trough `query` and `command` packages.
- Use Makefile recipes to setup, start and stop the server in the
background.
- The binary has the race detector enabled
- Init and setup jobs are configured to halt immediately on race
condition
- Because the server runs in the background, races are only logged. When
the server is stopped and race logs exist, the Makefile recipe will
throw an error and print the logs.
- Makefile recipes include logic to print logs and convert coverage
reports after the server is stopped.
- Some tests need a downstream HTTP server to make requests, like quota
and milestones. A new `integration/sink` package creates an HTTP server
and uses websockets to forward HTTP request back to the test packages.
The package API uses Go channels for abstraction and easy usage.

# Additional Changes

- Integration test files already used the `//go:build integration`
directive. In order to properly split integration from unit tests,
integration test files need to be in a `integration_test` subdirectory
of their package.
- `UseIsolatedInstance` used to overwrite the `Tester.Client` for each
instance. Now a `Instance` object is returned with a gRPC client that is
connected to the isolated instance's hostname.
- The `Tester` type is now `Instance`. The object is created for the
first instance, used by default in any test. Isolated instances are also
`Instance` objects and therefore benefit from the same methods and
values. The first instance and any other us capable of creating an
isolated instance over the system API.
- All test packages run in an Isolated instance by calling
`NewInstance()`
- Individual tests that use an isolated instance use `t.Parallel()`

# Additional Context

- Closes #6684
- https://go.dev/doc/articles/race_detector
- https://go.dev/doc/build-cover

---------

Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
2024-09-06 14:47:57 +02:00
Silvan
b522588d98 fix(eventstore): precise decimal (#8527)
# Which Problems Are Solved

Float64 which was used for the event.Position field is [not precise in
go and gets rounded](https://github.com/golang/go/issues/47300). This
can lead to unprecies position tracking of events and therefore
projections especially on cockcoachdb as the position used there is a
big number.

example of a unprecies position:
exact: 1725257931223002628
float64: 1725257931223002624.000000

# How the Problems Are Solved

The float64 was replaced by
[github.com/jackc/pgx-shopspring-decimal](https://github.com/jackc/pgx-shopspring-decimal).

# Additional Changes

Correct behaviour of makefile for load tests.
Rename `latestSequence`-queries to `latestPosition`
2024-09-06 12:19:19 +03:00
Livio Spring
9ec9ad4314 feat(oidc): sid claim for id_tokens issued through login V1 (#8525)
# Which Problems Are Solved

id_tokens issued for auth requests created through the login UI
currently do not provide a sid claim.
This is due to the fact that (SSO) sessions for the login UI do not have
one and are only computed by the userAgent(ID), the user(ID) and the
authentication checks of the latter.

This prevents client to track sessions and terminate specific session on
the end_session_endpoint.

# How the Problems Are Solved

- An `id` column is added to the `auth.user_sessions` table.
- The `id` (prefixed with `V1_`) is set whenever a session is added or
updated to active (from terminated)
- The id is passed to the `oidc session` (as v2 sessionIDs), to expose
it as `sid` claim

# Additional Changes

- refactored `getUpdateCols` to handle different column value types and
add arguments for query

# Additional Context

- closes #8499 
- relates to #8501
2024-09-03 13:19:00 +00:00
Stefan Benz
41ae35f2ef feat: add schema user create and remove (#8494)
# Which Problems Are Solved

Added functionality that user with a userschema can be created and
removed.

# How the Problems Are Solved

Added logic and moved APIs so that everything is API v3 conform.

# Additional Changes

- move of user and userschema API to resources folder
- changed testing and parameters
- some renaming

# Additional Context

closes #7308

---------

Co-authored-by: Elio Bischof <elio@zitadel.com>
2024-08-28 19:46:45 +00:00
Tim Möhlmann
fd0c15dd4f feat(oidc): use web keys for token signing and verification (#8449)
# Which Problems Are Solved

Use web keys, managed by the `resources/v3alpha/web_keys` API, for OIDC
token signing and verification,
as well as serving the public web keys on the jwks / keys endpoint.
Response header on the keys endpoint now allows caching of the response.
This is now "safe" to do since keys can be created ahead of time and
caches have sufficient time to pickup the change before keys get
enabled.

# How the Problems Are Solved

- The web key format is used in the `getSignerOnce` function in the
`api/oidc` package.
- The public key cache is changed to get and store web keys.
- The jwks / keys endpoint returns the combined set of valid "legacy"
public keys and all available web keys.
- Cache-Control max-age default to 5 minutes and is configured in
`defaults.yaml`.

When the web keys feature is enabled, fallback mechanisms are in place
to obtain and convert "legacy" `query.PublicKey` as web keys when
needed. This allows transitioning to the feature without invalidating
existing tokens. A small performance overhead may be noticed on the keys
endpoint, because 2 queries need to be run sequentially. This will
disappear once the feature is stable and the legacy code gets cleaned
up.

# Additional Changes

- Extend legacy key lifetimes so that tests can be run on an existing
database with more than 6 hours apart.
- Discovery endpoint returns all supported algorithms when the Web Key
feature is enabled.

# Additional Context

- Closes https://github.com/zitadel/zitadel/issues/8031
- Part of https://github.com/zitadel/zitadel/issues/7809
- After https://github.com/zitadel/oidc/pull/637
- After https://github.com/zitadel/oidc/pull/638
2024-08-23 14:43:46 +02:00
Livio Spring
c8e2a3bd49 feat: enable application performance profiling (#8442)
# Which Problems Are Solved

To have more insight on the performance, CPU and memory usage of
ZITADEL, we want to enable profiling.

# How the Problems Are Solved

- Allow profiling by configuration.
- Provide Google Cloud Profiler as first implementation

# Additional Changes

None.

# Additional Context

There were possible memory leaks reported:
https://discord.com/channels/927474939156643850/1273210227918897152

Co-authored-by: Silvan <silvan.reusser@gmail.com>
2024-08-16 13:26:53 +00:00
Stefan Benz
3e3d46ac0d feat: idp v2 api GetIDPByID (#8425)
# Which Problems Are Solved

GetIDPByID as endpoint in the API v2 so that it can be available for the
new login.

# How the Problems Are Solved

Create GetIDPByID endpoint with IDP v2 API, throught the GetProviderByID
implementation from admin and management API.

# Additional Changes

- Remove the OwnerType attribute from the response, as the information
is available through the resourceOwner.
- correct refs to messages in proto which are used for doc generation
- renaming of elements for API v3

# Additional Context

Closes #8337

---------

Co-authored-by: Livio Spring <livio.a@gmail.com>
2024-08-14 18:18:29 +00:00
Tim Möhlmann
64a3bb3149 feat(v3alpha): web key resource (#8262)
# Which Problems Are Solved

Implement a new API service that allows management of OIDC signing web
keys.
This allows users to manage rotation of the instance level keys. which
are currently managed based on expiry.

The API accepts the generation of the following key types and
parameters:

- RSA keys with 2048, 3072 or 4096 bit in size and:
  - Signing with SHA-256 (RS256)
  - Signing with SHA-384 (RS384)
  - Signing with SHA-512 (RS512)
- ECDSA keys with
  - P256 curve
  - P384 curve
  - P512 curve
- ED25519 keys

# How the Problems Are Solved

Keys are serialized for storage using the JSON web key format from the
`jose` library. This is the format that will be used by OIDC for
signing, verification and publication.

Each instance can have a number of key pairs. All existing public keys
are meant to be used for token verification and publication the keys
endpoint. Keys can be activated and the active private key is meant to
sign new tokens. There is always exactly 1 active signing key:

1. When the first key for an instance is generated, it is automatically
activated.
2. Activation of the next key automatically deactivates the previously
active key.
3. Keys cannot be manually deactivated from the API
4. Active keys cannot be deleted

# Additional Changes

- Query methods that later will be used by the OIDC package are already
implemented. Preparation for #8031
- Fix indentation in french translation for instance event
- Move user_schema translations to consistent positions in all
translation files

# Additional Context

- Closes #8030
- Part of #7809

---------

Co-authored-by: Elio Bischof <elio@zitadel.com>
2024-08-14 14:18:14 +00:00
Elio Bischof
042c438813 feat(v3alpha): read actions (#8357)
# Which Problems Are Solved

The current v3alpha actions APIs don't exactly adhere to the [new
resources API
design](https://zitadel.com/docs/apis/v3#standard-resources).

# How the Problems Are Solved

- **Improved ID access**: The aggregate ID is added to the resource
details object, so accessing resource IDs and constructing proto
messages for resources is easier
- **Explicit Instances**: Optionally, the instance can be explicitly
given in each request
- **Pagination**: A default search limit and a max search limit are
added to the defaults.yaml. They apply to the new v3 APIs (currently
only actions). The search query defaults are changed to ascending by
creation date, because this makes the pagination results the most
deterministic. The creation date is also added to the object details.
The bug with updated creation dates is fixed for executions and targets.
- **Removed Sequences**: Removed Sequence from object details and
ProcessedSequence from search details

# Additional Changes

Object details IDs are checked in unit test only if an empty ID is
expected. Centralizing the details check also makes this internal object
more flexible for future evolutions.

# Additional Context

- Closes #8169 
- Depends on https://github.com/zitadel/zitadel/pull/8225

---------

Co-authored-by: Silvan <silvan.reusser@gmail.com>
Co-authored-by: Stefan Benz <46600784+stebenz@users.noreply.github.com>
2024-08-12 22:32:01 +02:00
Silvan
cd3ffbd3eb fix(mirror): use correct statements on push (#8414)
# Which Problems Are Solved

The mirror command used the wrong position to filter for events if
different database technologies for source and destination were used.

# How the Problems Are Solved

The statements which diverge are stored on the client so that different
technologies can use different statements.

# Additional Context

- https://discord.com/channels/927474939156643850/1256396896243552347
2024-08-12 10:33:45 +00:00
Livio Spring
3f25e36fbd fix: provide device auth config (#8419)
# Which Problems Are Solved

There was no default configuration for `DeviceAuth`, which makes it
impossible to override by environment variables.
Additionally, a custom `CharAmount` value would overwrite also the
`DashInterval`.

# How the Problems Are Solved

- added to defaults.yaml
- fixed customization

# Additional Changes

None.

# Additional Context

- noticed during a customer request
2024-08-12 12:55:07 +03:00
Livio Spring
3d071fc505 feat: trusted (instance) domains (#8369)
# Which Problems Are Solved

ZITADEL currently selects the instance context based on a HTTP header
(see https://github.com/zitadel/zitadel/issues/8279#issue-2399959845 and
checks it against the list of instance domains. Let's call it instance
or API domain.
For any context based URL (e.g. OAuth, OIDC, SAML endpoints, links in
emails, ...) the requested domain (instance domain) will be used. Let's
call it the public domain.
In cases of proxied setups, all exposed domains (public domains) require
the domain to be managed as instance domain.
This can either be done using the "ExternalDomain" in the runtime config
or via system API, which requires a validation through CustomerPortal on
zitadel.cloud.

# How the Problems Are Solved

- Two new headers / header list are added:
- `InstanceHostHeaders`: an ordered list (first sent wins), which will
be used to match the instance.
(For backward compatibility: the `HTTP1HostHeader`, `HTTP2HostHeader`
and `forwarded`, `x-forwarded-for`, `x-forwarded-host` are checked
afterwards as well)
- `PublicHostHeaders`: an ordered list (first sent wins), which will be
used as public host / domain. This will be checked against a list of
trusted domains on the instance.
- The middleware intercepts all requests to the API and passes a
`DomainCtx` object with the hosts and protocol into the context
(previously only a computed `origin` was passed)
- HTTP / GRPC server do not longer try to match the headers to instances
themself, but use the passed `http.DomainContext` in their interceptors.
- The `RequestedHost` and `RequestedDomain` from authz.Instance are
removed in favor of the `http.DomainContext`
- When authenticating to or signing out from Console UI, the current
`http.DomainContext(ctx).Origin` (already checked by instance
interceptor for validity) is used to compute and dynamically add a
`redirect_uri` and `post_logout_redirect_uri`.
- Gateway passes all configured host headers (previously only did
`x-zitadel-*`)
- Admin API allows to manage trusted domain

# Additional Changes

None

# Additional Context

- part of #8279 
- open topics: 
  - "single-instance" mode
  - Console UI
2024-07-31 18:00:38 +03:00
Elio Bischof
cc3ec1e2a7 feat(v3alpha): write actions (#8225)
# Which Problems Are Solved

The current v3alpha actions APIs don't exactly adhere to the [new
resources API
design](https://zitadel.com/docs/apis/v3#standard-resources).

# How the Problems Are Solved

- **Breaking**: The current v3alpha actions APIs are removed. This is
breaking.
- **Resource Namespace**: New v3alpha actions APIs for targets and
executions are added under the namespace /resources.
- **Feature Flag**: New v3alpha actions APIs still have to be activated
using the actions feature flag
- **Reduced Executions Overhead**: Executions are managed similar to
settings according to the new API design: an empty list of targets
basically makes an execution a Noop. So a single method, SetExecution is
enough to cover all use cases. Noop executions are not returned in
future search requests.
- **Compatibility**: The executions created with previous v3alpha APIs
are still available to be managed with the new executions API.

# Additional Changes

- Removed integration tests which test executions but rely on readable
targets. They are added again with #8169

# Additional Context

Closes #8168
2024-07-31 14:42:12 +02:00
Stefan Benz
7d2d85f57c feat: api v2beta to api v2 (#8283)
# Which Problems Are Solved

The v2beta services are stable but not GA.

# How the Problems Are Solved

The v2beta services are copied to v2. The corresponding v1 and v2beta
services are deprecated.

# Additional Context

Closes #7236

---------

Co-authored-by: Elio Bischof <elio@zitadel.com>
2024-07-26 22:39:55 +02:00
Elio Bischof
693e27b906 fix: remove default TOS and privacy links (#8122)
# Which Problems Are Solved

The default terms of service and privacy policy links are applied to all
new ZITADEL instances, also for self hosters. However, the links
contents don't apply to self-hosters.

# How the Problems Are Solved

The links are removed from the DefaultInstance section in the
*defaults.yaml* file.
By default, the links are not shown anymore in the hosted login pages.
They can still be configured using the privacy policy.

# Additional Context

- Found because of a support request
2024-07-25 08:39:10 +02:00