# Controller Configuration & Tuning Flags

> All command-line flags exposed by the controller binary: QPS/burst, worker concurrency, warm-pool batch size, leader election, pprof, tracing, and cluster domain.

- Repository: kubernetes-sigs/agent-sandbox
- GitHub: https://github.com/kubernetes-sigs/agent-sandbox
- Human wiki: https://grok-wiki.com/public/wiki/kubernetes-sigs-agent-sandbox-c3f2597a654a
- Complete Markdown: https://grok-wiki.com/public/wiki/kubernetes-sigs-agent-sandbox-c3f2597a654a/llms-full.txt

## Source Files

- `cmd/agent-sandbox-controller/main.go`
- `docs/configuration.md`
- `helm/templates/_controller-args.tpl`
- `helm/values.yaml`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [cmd/agent-sandbox-controller/main.go](cmd/agent-sandbox-controller/main.go)
- [docs/configuration.md](docs/configuration.md)
- [helm/templates/_controller-args.tpl](helm/templates/_controller-args.tpl)
- [helm/values.yaml](helm/values.yaml)
- [controllers/sandbox_controller.go](controllers/sandbox_controller.go)
- [extensions/controllers/sandboxwarmpool_controller.go](extensions/controllers/sandboxwarmpool_controller.go)
- [internal/metrics/tracing.go](internal/metrics/tracing.go)
</details>

# Controller Configuration & Tuning Flags

The `agent-sandbox-controller` binary is a controller-runtime Manager that hosts the core `Sandbox` reconciler and, optionally, the extension reconcilers for `SandboxClaim`, `SandboxTemplate`, and `SandboxWarmPool`. Its runtime behavior — how aggressively it talks to the kube-apiserver, how many objects it reconciles in parallel, how it elects a leader, and which diagnostic endpoints it exposes — is governed entirely by command-line flags parsed in `main()`. This page enumerates each flag, the value it defaults to, the code path that consumes it, and how the Helm chart surfaces it.

The flags fall into six logical groups: cluster identity, kube-client tuning (QPS/burst), per-controller worker concurrency, warm-pool batch sizing, leader election, and observability (tracing + pprof). All of them are registered with the standard `flag` package on the global `flag.CommandLine`, alongside controller-runtime's zap logger flags which are bound via `opts.BindFlags(flag.CommandLine)`.

## Flag Registration Surface

All flags are declared as local variables in `main()` and bound with `flag.StringVar`, `flag.BoolVar`, `flag.IntVar`, or `flag.Float64Var` before `flag.Parse()` is called. Validation runs immediately after parsing: concurrency values must be positive, `kube-api-burst` must be positive, and exceeding 1000 total workers or exceeding the burst limit produces an informational warning rather than a hard error.

Sources: [cmd/agent-sandbox-controller/main.go:50-145]()

### Complete Flag Reference

| Flag | Type | Default | Purpose |
| --- | --- | --- | --- |
| `--version` | bool | `false` | Print version and exit. |
| `--cluster-domain` | string | `cluster.local` | Cluster DNS suffix used when composing the Sandbox `ServiceFQDN`. |
| `--metrics-bind-address` | string | `:8080` | Address for the Prometheus metrics endpoint (and pprof, when enabled). |
| `--health-probe-bind-address` | string | `:8081` | Address for `/healthz` and `/readyz`. |
| `--leader-elect` | bool | `true` | Enable controller-runtime leader election. |
| `--leader-election-namespace` | string | `""` | Namespace for the leader election Lease; auto-detected when empty. |
| `--extensions` | bool | `false` | Register the `SandboxClaim`, `SandboxTemplate`, and `SandboxWarmPool` reconcilers in addition to the core `Sandbox` reconciler. |
| `--enable-tracing` | bool | `false` | Initialize the OpenTelemetry SDK and export spans via OTLP/gRPC. |
| `--enable-pprof` | bool | `false` | Expose only `/debug/pprof/profile` on the metrics server. |
| `--enable-pprof-debug` | bool | `false` | Expose the full pprof/fgprof handler set. Implies `--enable-pprof`. |
| `--pprof-block-profile-rate` | int | `1000000` | `runtime.SetBlockProfileRate` value when pprof-debug is enabled. `<=0` disables; nanoseconds otherwise. |
| `--pprof-mutex-profile-fraction` | int | `10` | `runtime.SetMutexProfileFraction` value when pprof-debug is enabled. `<=0` disables; samples ~1/N. |
| `--kube-api-qps` | float64 | `-1.0` | QPS limit applied to the REST config (`-1` disables client-side throttling). |
| `--kube-api-burst` | int | `10` | Burst limit applied to the REST config. Must be `> 0`. |
| `--sandbox-concurrent-workers` | int | `1` | `MaxConcurrentReconciles` for the `Sandbox` controller. |
| `--sandbox-claim-concurrent-workers` | int | `1` | `MaxConcurrentReconciles` for the `SandboxClaim` controller (extensions only). |
| `--sandbox-warm-pool-concurrent-workers` | int | `1` | `MaxConcurrentReconciles` for the `SandboxWarmPool` controller (extensions only). |
| `--sandbox-template-concurrent-workers` | int | `1` | `MaxConcurrentReconciles` for the `SandboxTemplate` controller (extensions only). |
| `--sandbox-warm-pool-max-batch-size` | int | `300` | Maximum sandboxes the warm-pool reconciler creates or deletes in a single reconcile. |

Sources: [cmd/agent-sandbox-controller/main.go:70-97]()

## Flag-to-Manager Wiring

The diagram below maps every CLI flag to the controller-runtime construct or `runtime` setting it ultimately drives. It is grounded in `main.go` and the per-controller `SetupWithManager` calls.

```mermaid
flowchart LR
    subgraph CLI["CLI flags (flag.CommandLine)"]
        F_QPS["--kube-api-qps"]
        F_BURST["--kube-api-burst"]
        F_LE["--leader-elect"]
        F_LEN["--leader-election-namespace"]
        F_METRICS["--metrics-bind-address"]
        F_PROBE["--health-probe-bind-address"]
        F_PPROF["--enable-pprof / --enable-pprof-debug"]
        F_PPRATE["--pprof-block-profile-rate"]
        F_PPMUT["--pprof-mutex-profile-fraction"]
        F_TRACE["--enable-tracing"]
        F_EXT["--extensions"]
        F_CD["--cluster-domain"]
        F_W1["--sandbox-concurrent-workers"]
        F_W2["--sandbox-claim-concurrent-workers"]
        F_W3["--sandbox-warm-pool-concurrent-workers"]
        F_W4["--sandbox-template-concurrent-workers"]
        F_BATCH["--sandbox-warm-pool-max-batch-size"]
    end

    subgraph REST["rest.Config (ctrl.GetConfigOrDie)"]
        REST_QPS["restConfig.QPS"]
        REST_BURST["restConfig.Burst"]
    end

    subgraph MGR["ctrl.Manager (ctrl.NewManager)"]
        MGR_LE["LeaderElection / Namespace / LeaderElectionID"]
        MGR_METRICS["metricsserver.Options{BindAddress, ExtraHandlers}"]
        MGR_PROBE["HealthProbeBindAddress"]
    end

    subgraph RT["go runtime"]
        RT_BLOCK["runtime.SetBlockProfileRate"]
        RT_MUTEX["runtime.SetMutexProfileFraction"]
    end

    subgraph OBS["Observability"]
        OTEL["asmetrics.SetupOTel → otlptracegrpc exporter"]
    end

    subgraph CTRLS["Reconcilers"]
        SBX["SandboxReconciler{ClusterDomain, Tracer}"]
        CLM["SandboxClaimReconciler"]
        WP["SandboxWarmPoolReconciler{MaxBatchSize}"]
        TPL["SandboxTemplateReconciler"]
    end

    F_QPS --> REST_QPS
    F_BURST --> REST_BURST
    F_LE --> MGR_LE
    F_LEN --> MGR_LE
    F_METRICS --> MGR_METRICS
    F_PROBE --> MGR_PROBE
    F_PPROF --> MGR_METRICS
    F_PPRATE --> RT_BLOCK
    F_PPMUT --> RT_MUTEX
    F_TRACE --> OTEL --> SBX
    F_CD --> SBX
    F_W1 --> SBX
    F_EXT --> CLM
    F_EXT --> WP
    F_EXT --> TPL
    F_W2 --> CLM
    F_W3 --> WP
    F_W4 --> TPL
    F_BATCH --> WP
```

Sources: [cmd/agent-sandbox-controller/main.go:179-277]()

## Kube API Client Tuning

`--kube-api-qps` and `--kube-api-burst` are stamped onto the REST config returned by `ctrl.GetConfigOrDie()` before the Manager is constructed. The QPS value is cast from `float64` to `float32`; the documented default of `-1.0` disables the client-side rate limiter entirely.

```go
restConfig := ctrl.GetConfigOrDie()
restConfig.QPS = float32(kubeAPIQPS)
restConfig.Burst = kubeAPIBurst
```

After parsing, `main()` computes `totalWorkers = sandbox + sandboxClaim + sandboxWarmPool + sandboxTemplate`. If QPS is positive and the worker total exceeds `kubeAPIBurst`, the setup logger emits a warning about likely client-side throttling. A separate warning fires when `totalWorkers > 1000` regardless of QPS, on the theory that this would create excessive apiserver load.

Sources: [cmd/agent-sandbox-controller/main.go:130-145](), [cmd/agent-sandbox-controller/main.go:216-218]()

## Worker Concurrency

Each reconciler's `SetupWithManager` accepts a `concurrentWorkers int` and passes it to controller-runtime as `controller.Options{MaxConcurrentReconciles: concurrentWorkers}`. The flag-to-controller mapping is one-to-one:

| Flag | Reconciler | Setup call |
| --- | --- | --- |
| `--sandbox-concurrent-workers` | `controllers.SandboxReconciler` | `controllers/sandbox_controller.go:1130` |
| `--sandbox-claim-concurrent-workers` | `extensions/controllers.SandboxClaimReconciler` | `extensions/controllers/sandboxclaim_controller.go:1270` |
| `--sandbox-warm-pool-concurrent-workers` | `extensions/controllers.SandboxWarmPoolReconciler` | `extensions/controllers/sandboxwarmpool_controller.go:534` |
| `--sandbox-template-concurrent-workers` | `extensions/controllers.SandboxTemplateReconciler` | `extensions/controllers/sandboxtemplate_controller.go:216` |

The three extension reconcilers are only constructed when `--extensions=true`. Setting their worker flags without enabling extensions has no effect because the setup branch is skipped.

Sources: [cmd/agent-sandbox-controller/main.go:241-276](), [controllers/sandbox_controller.go:1130-1149]()

## Warm-Pool Batch Size

`--sandbox-warm-pool-max-batch-size` is passed into `SandboxWarmPoolReconciler.MaxBatchSize` and bounds how many Sandbox creations or deletions the warm-pool controller fans out in a single reconcile (the field is read as `int32(r.MaxBatchSize)` to drive parallel batches). Validation in `main` rejects values `<= 0`. As a safety net, `SetupWithManager` also clamps `MaxBatchSize <= 0` back to the package constant `sandboxCreateDeleteMaxBatchSize = 300`.

Sources: [cmd/agent-sandbox-controller/main.go:97](), [cmd/agent-sandbox-controller/main.go:269-273](), [extensions/controllers/sandboxwarmpool_controller.go:48-58](), [extensions/controllers/sandboxwarmpool_controller.go:534-537]()

## Leader Election

`--leader-elect` defaults to `true`, which matches the chart default in `helm/values.yaml`. The Manager is constructed with a fixed `LeaderElectionID` of `a3317529.agent-sandbox.x-k8s.io`, so multiple replicas of the same controller image will contend for the same Lease. `--leader-election-namespace` controls where that Lease lives; when empty, controller-runtime falls back to its auto-detection (in-cluster service account namespace), and the setup logger emits a V(1) note to that effect.

```go
mgr, err := ctrl.NewManager(restConfig, ctrl.Options{
    Scheme:                  scheme,
    Metrics:                 metricsOpts,
    HealthProbeBindAddress:  probeAddr,
    LeaderElection:          enableLeaderElection,
    LeaderElectionNamespace: leaderElectionNamespace,
    LeaderElectionID:        "a3317529.agent-sandbox.x-k8s.io",
})
```

Sources: [cmd/agent-sandbox-controller/main.go:147-149](), [cmd/agent-sandbox-controller/main.go:220-227]()

## Observability: Tracing and pprof

### Tracing

When `--enable-tracing` is set, `main()` calls `asmetrics.SetupOTel(initCtx, "agent-sandbox-controller")` with a 10-second initialization timeout. `SetupOTel` creates an OTLP/gRPC exporter (`otlptracegrpc.New(ctx)`), wires it into a batching `sdktrace.TracerProvider`, sets the global propagator to W3C `TraceContext` only, and returns a cleanup closure that calls `tp.Shutdown` on exit. The exporter respects the standard `OTEL_EXPORTER_OTLP_ENDPOINT` and `OTEL_EXPORTER_OTLP_INSECURE` environment variables. The resulting `Instrumenter` is plumbed into the `SandboxReconciler`, `SandboxClaimReconciler`, and `SandboxTemplateReconciler` as their `Tracer` field; if tracing is disabled, a no-op instrumenter (`asmetrics.NewNoOp()`) is used instead.

Sources: [cmd/agent-sandbox-controller/main.go:153-168](), [cmd/agent-sandbox-controller/main.go:236-265](), [internal/metrics/tracing.go:124-147]()

### pprof

The pprof handlers are mounted on the metrics server, **not** on Go's default `http.DefaultServeMux`. To prevent leakage from the side effect of importing `net/http/pprof`, `main()` deliberately resets the default mux:

```go
http.DefaultServeMux = http.NewServeMux()
```

Two flags govern exposure:

- `--enable-pprof` mounts only `/debug/pprof/profile` (CPU profile).
- `--enable-pprof-debug` implies `--enable-pprof` and additionally mounts `/debug/pprof/`, `cmdline`, `symbol`, `heap`, `goroutine`, `allocs`, `block`, `mutex`, `trace`, and `/debug/fgprof`. It also activates block/mutex profiling at the Go runtime level by calling `runtime.SetBlockProfileRate(pprofBlockProfileRate)` and `runtime.SetMutexProfileFraction(pprofMutexProfileFraction)`. Negative inputs are clamped to `0` with a warning.

The comment on `--enable-pprof-debug` notes it "may expose sensitive information and comes with performance overhead" — leaving it off is the safe default for production.

Sources: [cmd/agent-sandbox-controller/main.go:80-90](), [cmd/agent-sandbox-controller/main.go:170-214]()

## Cluster Domain

`--cluster-domain` is passed straight into `SandboxReconciler.ClusterDomain` and used to assemble `sandbox.Status.ServiceFQDN`:

```go
sandbox.Status.ServiceFQDN = service.Name + "." + service.Namespace + ".svc." + r.ClusterDomain
```

Override it only when the cluster is configured with a non-default DNS suffix.

Sources: [cmd/agent-sandbox-controller/main.go:71](), [cmd/agent-sandbox-controller/main.go:236-241](), [controllers/sandbox_controller.go:127](), [controllers/sandbox_controller.go:614]()

## Helm Chart Mapping

The chart template `helm/templates/_controller-args.tpl` emits a `--flag=value` line for each populated key under `.Values.controller`. Empty/false-y values are omitted, so the binary falls back to its own defaults; this is why most fields in `helm/values.yaml` are commented out. The mapping is purely camelCase-to-kebab-case:

| `controller.*` value | CLI flag |
| --- | --- |
| `leaderElect` | `--leader-elect` |
| `clusterDomain` | `--cluster-domain` |
| `leaderElectionNamespace` | `--leader-election-namespace` |
| `extensions` | `--extensions` |
| `enableTracing` | `--enable-tracing` |
| `enablePprof` | `--enable-pprof` |
| `enablePprofDebug` | `--enable-pprof-debug` |
| `pprofBlockProfileRate` | `--pprof-block-profile-rate` |
| `pprofMutexProfileFraction` | `--pprof-mutex-profile-fraction` |
| `kubeApiQps` | `--kube-api-qps` |
| `kubeApiBurst` | `--kube-api-burst` |
| `sandboxConcurrentWorkers` | `--sandbox-concurrent-workers` |
| `sandboxClaimConcurrentWorkers` | `--sandbox-claim-concurrent-workers` |
| `sandboxWarmPoolConcurrentWorkers` | `--sandbox-warm-pool-concurrent-workers` |
| `sandboxTemplateConcurrentWorkers` | `--sandbox-template-concurrent-workers` |
| `extraArgs[]` | any flag not listed above (e.g. zap logger flags, `--sandbox-warm-pool-max-batch-size`) |

Note that `--sandbox-warm-pool-max-batch-size` and `--metrics-bind-address`/`--health-probe-bind-address` are not first-class keys in the chart; pass them through `controller.extraArgs`.

Sources: [helm/templates/_controller-args.tpl:1-50](), [helm/values.yaml:29-48]()

## Worked Example: High-Throughput Extensions Deployment

The pattern documented in `docs/configuration.md` for a high-throughput cluster combines extension enablement, raised per-controller worker counts, a larger warm-pool batch, and explicit kube-client QPS/burst sized to the worker total:

```yaml
args:
  - --leader-elect=true
  - --extensions
  - --sandbox-concurrent-workers=10
  - --sandbox-claim-concurrent-workers=10
  - --sandbox-warm-pool-concurrent-workers=10
  - --sandbox-warm-pool-max-batch-size=500
  - --kube-api-qps=50
  - --kube-api-burst=100
```

With these values, the validation in `main()` is satisfied: all worker counts are positive, the four-controller total of `30` (plus the default `sandboxTemplateConcurrentWorkers=1`, i.e. `31`) is well under `1000`, and it fits within `kubeAPIBurst=100`, so no throttling warning fires.

Sources: [docs/configuration.md:37-52](), [cmd/agent-sandbox-controller/main.go:119-145]()

## Summary

The controller's tunable surface is intentionally small and entirely flag-driven: a Manager is constructed from `--leader-elect*`, `--metrics-bind-address`, and `--health-probe-bind-address`; its REST client is shaped by `--kube-api-qps`/`--kube-api-burst`; each reconciler's `MaxConcurrentReconciles` comes from one `--*-concurrent-workers` flag; the warm-pool fan-out is bounded by `--sandbox-warm-pool-max-batch-size`; and the diagnostic surface (`--enable-tracing`, `--enable-pprof`, `--enable-pprof-debug`, and the two pprof sampling-rate knobs) is wired into OpenTelemetry and the Go runtime respectively. The Helm chart's `controller.*` keys are a thin, omit-if-empty pass-through to these same flags, with `controller.extraArgs` as the escape hatch for anything the template does not enumerate.
