Alexei-Barnes 09b021b257
feat: Configurable Unique Machine Identification (#3626)
* feat: Configurable Unique Machine Identification

This change fixes Segfault on AWS App Runner with v2 #3625

The change introduces two new dependencies:

* github.com/drone/envsubst for supporting AWS ECS, which has its metadata endpoint described by an environment variable
* github.com/jarcoal/jpath so that only relevant data from a metadata response is used to identify the machine.

The change ads new configuration (see `defaults.yaml`):

* `Machine.Identification` enables configuration of how machines are uniquely identified - I'm not sure about the top level category `Machine`, as I don't have anything else to add to it. Happy to hear suggestions for better naming or structure here.
* `Machine.Identifiation.PrivateId` turns on or off the existing private IP based identification. Default is on.
* `Machine.Identification.Hostname` turns on or off using the OS hostname to identify the machine. Great for most cloud environments, where this tends to be set to something that identifies the machine uniquely. Enabled by default.
* `Machine.Identification.Webhook` configures identification based on the response to an HTTP GET request.  Request headers can be configured, a JSONPath can be set for processing the response (no JSON parsing is done if this is not set), and the URL is allowed to contain environment variables in the format `"${var}"`.

The new flow for getting a unique machine id is:

1. PrivateIP (if enabled)
2. Hostname (if enabled)
3. Webhook (if enabled, to configured URL)
4. Give up and error out.

It's important that init configures machine identity first. Otherwise we could try to get an ID before configuring it. To prevent this from causing difficult to debug issues, where for example the default configuration was used, I've ensured that
the application will generate an error if the module hasn't been configured and you try to get an ID.

Misc changes:

* Spelling and gramatical corrections to `init.go::New()` long description.
* Spelling corrections to `verify_zitadel.go::newZitadel()`.
* Updated `production.md` and `development.md` based on the new build process. I think the run instructions are also out of date, but I'll leave that for someone else.
* `id.SonyFlakeGenerator` is now a function, which sets `id.sonyFlakeGenerator`, this allows us to defer initialization until configuration has been read.

* Update internal/id/config.go

Co-authored-by: Alexei-Barnes <82444470+Alexei-Barnes@users.noreply.github.com>

* Fix authored by @livio-a for tests

Co-authored-by: Livio Amstutz <livio.a@gmail.com>
2022-05-24 16:57:57 +02:00

89 lines
2.6 KiB
Go

package crdb
import (
"context"
"database/sql"
"fmt"
"time"
"github.com/zitadel/logging"
"github.com/zitadel/zitadel/internal/errors"
"github.com/zitadel/zitadel/internal/id"
)
const (
lockStmtFormat = "INSERT INTO %[1]s" +
" (locker_id, locked_until, projection_name, instance_id) VALUES ($1, now()+$2::INTERVAL, $3, $4)" +
" ON CONFLICT (projection_name, instance_id)" +
" DO UPDATE SET locker_id = $1, locked_until = now()+$2::INTERVAL" +
" WHERE %[1]s.projection_name = $3 AND %[1]s.instance_id = $4 AND (%[1]s.locker_id = $1 OR %[1]s.locked_until < now())"
)
type Locker interface {
Lock(ctx context.Context, lockDuration time.Duration, instanceID string) <-chan error
Unlock(instanceID string) error
}
type locker struct {
client *sql.DB
lockStmt string
workerName string
projectionName string
}
func NewLocker(client *sql.DB, lockTable, projectionName string) Locker {
workerName, err := id.SonyFlakeGenerator().Next()
logging.OnError(err).Panic("unable to generate lockID")
return &locker{
client: client,
lockStmt: fmt.Sprintf(lockStmtFormat, lockTable),
workerName: workerName,
projectionName: projectionName,
}
}
func (h *locker) Lock(ctx context.Context, lockDuration time.Duration, instanceID string) <-chan error {
errs := make(chan error)
go h.handleLock(ctx, errs, lockDuration, instanceID)
return errs
}
func (h *locker) handleLock(ctx context.Context, errs chan error, lockDuration time.Duration, instanceID string) {
renewLock := time.NewTimer(0)
for {
select {
case <-renewLock.C:
errs <- h.renewLock(ctx, lockDuration, instanceID)
//refresh the lock 500ms before it times out. 500ms should be enough for one transaction
renewLock.Reset(lockDuration - (500 * time.Millisecond))
case <-ctx.Done():
close(errs)
renewLock.Stop()
return
}
}
}
func (h *locker) renewLock(ctx context.Context, lockDuration time.Duration, instanceID string) error {
//the unit of crdb interval is seconds (https://www.cockroachlabs.com/docs/stable/interval.html).
res, err := h.client.ExecContext(ctx, h.lockStmt, h.workerName, lockDuration.Seconds(), h.projectionName, instanceID)
if err != nil {
return errors.ThrowInternal(err, "CRDB-uaDoR", "unable to execute lock")
}
if rows, _ := res.RowsAffected(); rows == 0 {
return errors.ThrowAlreadyExists(nil, "CRDB-mmi4J", "projection already locked")
}
return nil
}
func (h *locker) Unlock(instanceID string) error {
_, err := h.client.Exec(h.lockStmt, h.workerName, float64(0), h.projectionName, instanceID)
if err != nil {
return errors.ThrowUnknown(err, "CRDB-JjfwO", "unlock failed")
}
return nil
}