mirror of
https://github.com/juanfont/headscale.git
synced 2025-07-29 21:23:44 +00:00
396 lines
16 KiB
Markdown
396 lines
16 KiB
Markdown
![]() |
# CLAUDE.md
|
||
|
|
||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||
|
|
||
|
## Overview
|
||
|
|
||
|
Headscale is an open-source implementation of the Tailscale control server written in Go. It provides self-hosted coordination for Tailscale networks (tailnets), managing node registration, IP allocation, policy enforcement, and DERP routing.
|
||
|
|
||
|
## Development Commands
|
||
|
|
||
|
### Quick Setup
|
||
|
```bash
|
||
|
# Recommended: Use Nix for dependency management
|
||
|
nix develop
|
||
|
|
||
|
# Full development workflow
|
||
|
make dev # runs fmt + lint + test + build
|
||
|
```
|
||
|
|
||
|
### Essential Commands
|
||
|
```bash
|
||
|
# Build headscale binary
|
||
|
make build
|
||
|
|
||
|
# Run tests
|
||
|
make test
|
||
|
go test ./... # All unit tests
|
||
|
go test -race ./... # With race detection
|
||
|
|
||
|
# Run specific integration test
|
||
|
go run ./cmd/hi run "TestName" --postgres
|
||
|
|
||
|
# Code formatting and linting
|
||
|
make fmt # Format all code (Go, docs, proto)
|
||
|
make lint # Lint all code (Go, proto)
|
||
|
make fmt-go # Format Go code only
|
||
|
make lint-go # Lint Go code only
|
||
|
|
||
|
# Protocol buffer generation (after modifying proto/)
|
||
|
make generate
|
||
|
|
||
|
# Clean build artifacts
|
||
|
make clean
|
||
|
```
|
||
|
|
||
|
### Integration Testing
|
||
|
```bash
|
||
|
# Use the hi (Headscale Integration) test runner
|
||
|
go run ./cmd/hi doctor # Check system requirements
|
||
|
go run ./cmd/hi run "TestPattern" # Run specific test
|
||
|
go run ./cmd/hi run "TestPattern" --postgres # With PostgreSQL backend
|
||
|
|
||
|
# Test artifacts are saved to control_logs/ with logs and debug data
|
||
|
```
|
||
|
|
||
|
## Project Structure & Architecture
|
||
|
|
||
|
### Top-Level Organization
|
||
|
|
||
|
```
|
||
|
headscale/
|
||
|
├── cmd/ # Command-line applications
|
||
|
│ ├── headscale/ # Main headscale server binary
|
||
|
│ └── hi/ # Headscale Integration test runner
|
||
|
├── hscontrol/ # Core control plane logic
|
||
|
├── integration/ # End-to-end Docker-based tests
|
||
|
├── proto/ # Protocol buffer definitions
|
||
|
├── gen/ # Generated code (protobuf)
|
||
|
├── docs/ # Documentation
|
||
|
└── packaging/ # Distribution packaging
|
||
|
```
|
||
|
|
||
|
### Core Packages (`hscontrol/`)
|
||
|
|
||
|
**Main Server (`hscontrol/`)**
|
||
|
- `app.go`: Application setup, dependency injection, server lifecycle
|
||
|
- `handlers.go`: HTTP/gRPC API endpoints for management operations
|
||
|
- `grpcv1.go`: gRPC service implementation for headscale API
|
||
|
- `poll.go`: **Critical** - Handles Tailscale MapRequest/MapResponse protocol
|
||
|
- `noise.go`: Noise protocol implementation for secure client communication
|
||
|
- `auth.go`: Authentication flows (web, OIDC, command-line)
|
||
|
- `oidc.go`: OpenID Connect integration for user authentication
|
||
|
|
||
|
**State Management (`hscontrol/state/`)**
|
||
|
- `state.go`: Central coordinator for all subsystems (database, policy, IP allocation, DERP)
|
||
|
- `node_store.go`: **Performance-critical** - In-memory cache with copy-on-write semantics
|
||
|
- Thread-safe operations with deadlock detection
|
||
|
- Coordinates between database persistence and real-time operations
|
||
|
|
||
|
**Database Layer (`hscontrol/db/`)**
|
||
|
- `db.go`: Database abstraction, GORM setup, migration management
|
||
|
- `node.go`: Node lifecycle, registration, expiration, IP assignment
|
||
|
- `users.go`: User management, namespace isolation
|
||
|
- `api_key.go`: API authentication tokens
|
||
|
- `preauth_keys.go`: Pre-authentication keys for automated node registration
|
||
|
- `ip.go`: IP address allocation and management
|
||
|
- `policy.go`: Policy storage and retrieval
|
||
|
- Schema migrations in `schema.sql` with extensive test data coverage
|
||
|
|
||
|
**Policy Engine (`hscontrol/policy/`)**
|
||
|
- `policy.go`: Core ACL evaluation logic, HuJSON parsing
|
||
|
- `v2/`: Next-generation policy system with improved filtering
|
||
|
- `matcher/`: ACL rule matching and evaluation engine
|
||
|
- Determines peer visibility, route approval, and network access rules
|
||
|
- Supports both file-based and database-stored policies
|
||
|
|
||
|
**Network Management (`hscontrol/`)**
|
||
|
- `derp/`: DERP (Designated Encrypted Relay for Packets) server implementation
|
||
|
- NAT traversal when direct connections fail
|
||
|
- Fallback relay for firewall-restricted environments
|
||
|
- `mapper/`: Converts internal Headscale state to Tailscale's wire protocol format
|
||
|
- `tail.go`: Tailscale-specific data structure generation
|
||
|
- `routes/`: Subnet route management and primary route selection
|
||
|
- `dns/`: DNS record management and MagicDNS implementation
|
||
|
|
||
|
**Utilities & Support (`hscontrol/`)**
|
||
|
- `types/`: Core data structures, configuration, validation
|
||
|
- `util/`: Helper functions for networking, DNS, key management
|
||
|
- `templates/`: Client configuration templates (Apple, Windows, etc.)
|
||
|
- `notifier/`: Event notification system for real-time updates
|
||
|
- `metrics.go`: Prometheus metrics collection
|
||
|
- `capver/`: Tailscale capability version management
|
||
|
|
||
|
### Key Subsystem Interactions
|
||
|
|
||
|
**Node Registration Flow**
|
||
|
1. **Client Connection**: `noise.go` handles secure protocol handshake
|
||
|
2. **Authentication**: `auth.go` validates credentials (web/OIDC/preauth)
|
||
|
3. **State Creation**: `state.go` coordinates IP allocation via `db/ip.go`
|
||
|
4. **Storage**: `db/node.go` persists node, `NodeStore` caches in memory
|
||
|
5. **Network Setup**: `mapper/` generates initial Tailscale network map
|
||
|
|
||
|
**Ongoing Operations**
|
||
|
1. **Poll Requests**: `poll.go` receives periodic client updates
|
||
|
2. **State Updates**: `NodeStore` maintains real-time node information
|
||
|
3. **Policy Application**: `policy/` evaluates ACL rules for peer relationships
|
||
|
4. **Map Distribution**: `mapper/` sends network topology to all affected clients
|
||
|
|
||
|
**Route Management**
|
||
|
1. **Advertisement**: Clients announce routes via `poll.go` Hostinfo updates
|
||
|
2. **Storage**: `db/` persists routes, `NodeStore` caches for performance
|
||
|
3. **Approval**: `policy/` auto-approves routes based on ACL rules
|
||
|
4. **Distribution**: `routes/` selects primary routes, `mapper/` distributes to peers
|
||
|
|
||
|
### Command-Line Tools (`cmd/`)
|
||
|
|
||
|
**Main Server (`cmd/headscale/`)**
|
||
|
- `headscale.go`: CLI parsing, configuration loading, server startup
|
||
|
- Supports daemon mode, CLI operations (user/node management), database operations
|
||
|
|
||
|
**Integration Test Runner (`cmd/hi/`)**
|
||
|
- `main.go`: Test execution framework with Docker orchestration
|
||
|
- `run.go`: Individual test execution with artifact collection
|
||
|
- `doctor.go`: System requirements validation
|
||
|
- `docker.go`: Container lifecycle management
|
||
|
- Essential for validating changes against real Tailscale clients
|
||
|
|
||
|
### Generated & External Code
|
||
|
|
||
|
**Protocol Buffers (`proto/` → `gen/`)**
|
||
|
- Defines gRPC API for headscale management operations
|
||
|
- Client libraries can generate from these definitions
|
||
|
- Run `make generate` after modifying `.proto` files
|
||
|
|
||
|
**Integration Testing (`integration/`)**
|
||
|
- `scenario.go`: Docker test environment setup
|
||
|
- `tailscale.go`: Tailscale client container management
|
||
|
- Individual test files for specific functionality areas
|
||
|
- Real end-to-end validation with network isolation
|
||
|
|
||
|
### Critical Performance Paths
|
||
|
|
||
|
**High-Frequency Operations**
|
||
|
1. **MapRequest Processing** (`poll.go`): Every 15-60 seconds per client
|
||
|
2. **NodeStore Reads** (`node_store.go`): Every operation requiring node data
|
||
|
3. **Policy Evaluation** (`policy/`): On every peer relationship calculation
|
||
|
4. **Route Lookups** (`routes/`): During network map generation
|
||
|
|
||
|
**Database Write Patterns**
|
||
|
- **Frequent**: Node heartbeats, endpoint updates, route changes
|
||
|
- **Moderate**: User operations, policy updates, API key management
|
||
|
- **Rare**: Schema migrations, bulk operations
|
||
|
|
||
|
### Configuration & Deployment
|
||
|
|
||
|
**Configuration** (`hscontrol/types/config.go`)**
|
||
|
- Database connection settings (SQLite/PostgreSQL)
|
||
|
- Network configuration (IP ranges, DNS settings)
|
||
|
- Policy mode (file vs database)
|
||
|
- DERP relay configuration
|
||
|
- OIDC provider settings
|
||
|
|
||
|
**Key Dependencies**
|
||
|
- **GORM**: Database ORM with migration support
|
||
|
- **Tailscale Libraries**: Core networking and protocol code
|
||
|
- **Zerolog**: Structured logging throughout the application
|
||
|
- **Buf**: Protocol buffer toolchain for code generation
|
||
|
|
||
|
### Development Workflow Integration
|
||
|
|
||
|
The architecture supports incremental development:
|
||
|
- **Unit Tests**: Focus on individual packages (`*_test.go` files)
|
||
|
- **Integration Tests**: Validate cross-component interactions
|
||
|
- **Database Tests**: Extensive migration and data integrity validation
|
||
|
- **Policy Tests**: ACL rule evaluation and edge cases
|
||
|
- **Performance Tests**: NodeStore and high-frequency operation validation
|
||
|
|
||
|
## Integration Test System
|
||
|
|
||
|
### Overview
|
||
|
Integration tests use Docker containers running real Tailscale clients against a Headscale server. Tests validate end-to-end functionality including routing, ACLs, node lifecycle, and network coordination.
|
||
|
|
||
|
### Running Integration Tests
|
||
|
|
||
|
**System Requirements**
|
||
|
```bash
|
||
|
# Check if your system is ready
|
||
|
go run ./cmd/hi doctor
|
||
|
```
|
||
|
This verifies Docker, Go, required images, and disk space.
|
||
|
|
||
|
**Test Execution Patterns**
|
||
|
```bash
|
||
|
# Run a single test (recommended for development)
|
||
|
go run ./cmd/hi run "TestSubnetRouterMultiNetwork"
|
||
|
|
||
|
# Run with PostgreSQL backend (for database-heavy tests)
|
||
|
go run ./cmd/hi run "TestExpireNode" --postgres
|
||
|
|
||
|
# Run multiple tests with pattern matching
|
||
|
go run ./cmd/hi run "TestSubnet*"
|
||
|
|
||
|
# Run all integration tests (CI/full validation)
|
||
|
go test ./integration -timeout 30m
|
||
|
```
|
||
|
|
||
|
**Test Categories & Timing**
|
||
|
- **Fast tests** (< 2 min): Basic functionality, CLI operations
|
||
|
- **Medium tests** (2-5 min): Route management, ACL validation
|
||
|
- **Slow tests** (5+ min): Node expiration, HA failover
|
||
|
- **Long-running tests** (10+ min): `TestNodeOnlineStatus` (12 min duration)
|
||
|
|
||
|
### Test Infrastructure
|
||
|
|
||
|
**Docker Setup**
|
||
|
- Headscale server container with configurable database backend
|
||
|
- Multiple Tailscale client containers with different versions
|
||
|
- Isolated networks per test scenario
|
||
|
- Automatic cleanup after test completion
|
||
|
|
||
|
**Test Artifacts**
|
||
|
All test runs save artifacts to `control_logs/TIMESTAMP-ID/`:
|
||
|
```
|
||
|
control_logs/20250713-213106-iajsux/
|
||
|
├── hs-testname-abc123.stderr.log # Headscale server logs
|
||
|
├── hs-testname-abc123.stdout.log
|
||
|
├── hs-testname-abc123.db # Database snapshot
|
||
|
├── hs-testname-abc123_metrics.txt # Prometheus metrics
|
||
|
├── hs-testname-abc123-mapresponses/ # Protocol debug data
|
||
|
├── ts-client-xyz789.stderr.log # Tailscale client logs
|
||
|
├── ts-client-xyz789.stdout.log
|
||
|
└── ts-client-xyz789_status.json # Client status dump
|
||
|
```
|
||
|
|
||
|
### Test Development Guidelines
|
||
|
|
||
|
**Timing Considerations**
|
||
|
Integration tests involve real network operations and Docker container lifecycle:
|
||
|
|
||
|
```go
|
||
|
// ❌ Wrong: Immediate assertions after async operations
|
||
|
client.Execute([]string{"tailscale", "set", "--advertise-routes=10.0.0.0/24"})
|
||
|
nodes, _ := headscale.ListNodes()
|
||
|
require.Len(t, nodes[0].GetAvailableRoutes(), 1) // May fail due to timing
|
||
|
|
||
|
// ✅ Correct: Wait for async operations to complete
|
||
|
client.Execute([]string{"tailscale", "set", "--advertise-routes=10.0.0.0/24"})
|
||
|
require.EventuallyWithT(t, func(c *assert.CollectT) {
|
||
|
nodes, err := headscale.ListNodes()
|
||
|
assert.NoError(c, err)
|
||
|
assert.Len(c, nodes[0].GetAvailableRoutes(), 1)
|
||
|
}, 10*time.Second, 100*time.Millisecond, "route should be advertised")
|
||
|
```
|
||
|
|
||
|
**Common Test Patterns**
|
||
|
- **Route Advertisement**: Use `EventuallyWithT` for route propagation
|
||
|
- **Node State Changes**: Wait for NodeStore synchronization
|
||
|
- **ACL Policy Changes**: Allow time for policy recalculation
|
||
|
- **Network Connectivity**: Use ping tests with retries
|
||
|
|
||
|
**Test Data Management**
|
||
|
```go
|
||
|
// Node identification: Don't assume array ordering
|
||
|
expectedRoutes := map[string]string{"1": "10.33.0.0/16"}
|
||
|
for _, node := range nodes {
|
||
|
nodeIDStr := fmt.Sprintf("%d", node.GetId())
|
||
|
if route, shouldHaveRoute := expectedRoutes[nodeIDStr]; shouldHaveRoute {
|
||
|
// Test the node that should have the route
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
### Troubleshooting Integration Tests
|
||
|
|
||
|
**Common Failure Patterns**
|
||
|
1. **Timing Issues**: Test assertions run before async operations complete
|
||
|
- **Solution**: Use `EventuallyWithT` with appropriate timeouts
|
||
|
- **Timeout Guidelines**: 3-5s for route operations, 10s for complex scenarios
|
||
|
|
||
|
2. **Infrastructure Problems**: Disk space, Docker issues, network conflicts
|
||
|
- **Check**: `go run ./cmd/hi doctor` for system health
|
||
|
- **Clean**: Remove old test containers and networks
|
||
|
|
||
|
3. **NodeStore Synchronization**: Tests expecting immediate data availability
|
||
|
- **Key Points**: Route advertisements must propagate through poll requests
|
||
|
- **Fix**: Wait for NodeStore updates after Hostinfo changes
|
||
|
|
||
|
4. **Database Backend Differences**: SQLite vs PostgreSQL behavior differences
|
||
|
- **Use**: `--postgres` flag for database-intensive tests
|
||
|
- **Note**: Some timing characteristics differ between backends
|
||
|
|
||
|
**Debugging Failed Tests**
|
||
|
1. **Check test artifacts** in `control_logs/` for detailed logs
|
||
|
2. **Examine MapResponse JSON** files for protocol-level debugging
|
||
|
3. **Review Headscale stderr logs** for server-side error messages
|
||
|
4. **Check Tailscale client status** for network-level issues
|
||
|
|
||
|
**Resource Management**
|
||
|
- Tests require significant disk space (each run ~100MB of logs)
|
||
|
- Docker containers are cleaned up automatically on success
|
||
|
- Failed tests may leave containers running - clean manually if needed
|
||
|
- Use `docker system prune` periodically to reclaim space
|
||
|
|
||
|
### Best Practices for Test Modifications
|
||
|
|
||
|
1. **Always test locally** before committing integration test changes
|
||
|
2. **Use appropriate timeouts** - too short causes flaky tests, too long slows CI
|
||
|
3. **Clean up properly** - ensure tests don't leave persistent state
|
||
|
4. **Handle both success and failure paths** in test scenarios
|
||
|
5. **Document timing requirements** for complex test scenarios
|
||
|
|
||
|
## NodeStore Implementation Details
|
||
|
|
||
|
**Key Insight from Recent Work**: The NodeStore is a critical performance optimization that caches node data in memory while ensuring consistency with the database. When working with route advertisements or node state changes:
|
||
|
|
||
|
1. **Timing Considerations**: Route advertisements need time to propagate from clients to server. Use `require.EventuallyWithT()` patterns in tests instead of immediate assertions.
|
||
|
|
||
|
2. **Synchronization Points**: NodeStore updates happen at specific points like `poll.go:420` after Hostinfo changes. Ensure these are maintained when modifying the polling logic.
|
||
|
|
||
|
3. **Peer Visibility**: The NodeStore's `peersFunc` determines which nodes are visible to each other. Policy-based filtering is separate from monitoring visibility - expired nodes should remain visible for debugging but marked as expired.
|
||
|
|
||
|
## Testing Guidelines
|
||
|
|
||
|
### Integration Test Patterns
|
||
|
```go
|
||
|
// Use EventuallyWithT for async operations
|
||
|
require.EventuallyWithT(t, func(c *assert.CollectT) {
|
||
|
nodes, err := headscale.ListNodes()
|
||
|
assert.NoError(c, err)
|
||
|
// Check expected state
|
||
|
}, 10*time.Second, 100*time.Millisecond, "description")
|
||
|
|
||
|
// Node route checking by actual node properties, not array position
|
||
|
var routeNode *v1.Node
|
||
|
for _, node := range nodes {
|
||
|
if nodeIDStr := fmt.Sprintf("%d", node.GetId()); expectedRoutes[nodeIDStr] != "" {
|
||
|
routeNode = node
|
||
|
break
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
### Running Problematic Tests
|
||
|
- Some tests require significant time (e.g., `TestNodeOnlineStatus` runs for 12 minutes)
|
||
|
- Infrastructure issues like disk space can cause test failures unrelated to code changes
|
||
|
- Use `--postgres` flag when testing database-heavy scenarios
|
||
|
|
||
|
## Important Notes
|
||
|
|
||
|
- **Dependencies**: Use `nix develop` for consistent toolchain (Go, buf, protobuf tools, linting)
|
||
|
- **Protocol Buffers**: Changes to `proto/` require `make generate` and should be committed separately
|
||
|
- **Code Style**: Enforced via golangci-lint with golines (width 88) and gofumpt formatting
|
||
|
- **Database**: Supports both SQLite (development) and PostgreSQL (production/testing)
|
||
|
- **Integration Tests**: Require Docker and can consume significant disk space
|
||
|
- **Performance**: NodeStore optimizations are critical for scale - be careful with changes to state management
|
||
|
|
||
|
## Debugging Integration Tests
|
||
|
|
||
|
Test artifacts are preserved in `control_logs/TIMESTAMP-ID/` including:
|
||
|
- Headscale server logs (stderr/stdout)
|
||
|
- Tailscale client logs and status
|
||
|
- Database dumps and network captures
|
||
|
- MapResponse JSON files for protocol debugging
|
||
|
|
||
|
When tests fail, check these artifacts first before assuming code issues.
|