# Controller Manager Entry Point

> How cmd/agent-sandbox-controller/main.go wires schemes, the controller-runtime Manager, metrics/pprof servers, leader election, and the optional extensions reconciler set.

- Repository: kubernetes-sigs/agent-sandbox
- GitHub: https://github.com/kubernetes-sigs/agent-sandbox
- Human wiki: https://grok-wiki.com/public/wiki/kubernetes-sigs-agent-sandbox-c3f2597a654a
- Complete Markdown: https://grok-wiki.com/public/wiki/kubernetes-sigs-agent-sandbox-c3f2597a654a/llms-full.txt

## Source Files

- `cmd/agent-sandbox-controller/main.go`
- `internal/version/`
- `Dockerfile`
- `helm/templates/deployment.yaml`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [cmd/agent-sandbox-controller/main.go](cmd/agent-sandbox-controller/main.go)
- [internal/version/version.go](internal/version/version.go)
- [Dockerfile](Dockerfile)
- [helm/templates/deployment.yaml](helm/templates/deployment.yaml)
- [helm/templates/_controller-args.tpl](helm/templates/_controller-args.tpl)
- [controllers/sandbox_controller.go](controllers/sandbox_controller.go)
- [extensions/api/v1beta1/groupversion_info.go](extensions/api/v1beta1/groupversion_info.go)
- [extensions/controllers/sandboxclaim_controller.go](extensions/controllers/sandboxclaim_controller.go)
- [extensions/controllers/sandboxtemplate_controller.go](extensions/controllers/sandboxtemplate_controller.go)
- [extensions/controllers/sandboxwarmpool_controller.go](extensions/controllers/sandboxwarmpool_controller.go)
- [internal/metrics/tracing.go](internal/metrics/tracing.go)
- [internal/metrics/sandbox_collector.go](internal/metrics/sandbox_collector.go)
</details>

# Controller Manager Entry Point

The single binary that runs the agent-sandbox operator is built from `cmd/agent-sandbox-controller/main.go`. It is a thin program by Kubernetes operator standards: it parses CLI flags, builds the runtime `Scheme`, configures the metrics/healthz/pprof servers, constructs a controller-runtime `Manager`, registers the `Sandbox` reconciler unconditionally, optionally wires three extra reconcilers behind an `--extensions` switch, and finally blocks on `mgr.Start(ctx)` until a termination signal arrives.

This page maps each section of `main.go` to the concrete pieces of the project it wires together (scheme registration, the controllers package, the `internal/metrics` instrumenter, the `extensions/controllers` set, and the `internal/version` build-stamp data), and shows how the Helm deployment and the multi-stage Dockerfile match those flags and probes.

## Binary layout and build provenance

The `main` package lives at `cmd/agent-sandbox-controller/main.go`. There is exactly one entry point and exactly one binary; both the Helm deployment and the Dockerfile reference it as `/agent-sandbox-controller`.

The two-stage Dockerfile compiles the binary statically with `CGO_ENABLED=0`, strips symbols (`-s -w`), and injects three build-time identifiers into the `internal/version` package via `-ldflags -X`:

```dockerfile
RUN CGO_ENABLED=0 GOOS=linux GOARCH=${TARGETARCH} go build \
    -ldflags="-s -w -X sigs.k8s.io/agent-sandbox/internal/version.gitVersion=${GIT_VERSION} \
              -X sigs.k8s.io/agent-sandbox/internal/version.gitSHA=${GIT_SHA} \
              -X sigs.k8s.io/agent-sandbox/internal/version.buildDate=${BUILD_DATE}" \
    -o /agent-sandbox-controller ./cmd/agent-sandbox-controller
```

The runtime image is `gcr.io/distroless/static-debian13:nonroot`, with the binary as the `ENTRYPOINT`. Those `-X` symbols target the package-level variables declared in `internal/version/version.go`:

```go
var (
    gitVersion = "unknown"
    gitSHA     = "unknown"
    buildDate  = "unknown"
    goVersion  = runtime.Version()
    goCompiler = runtime.Compiler
    goOS       = runtime.GOOS
    goArch     = runtime.GOARCH
)
```

`version.Print("agent-sandbox-controller")` renders a small text template that includes program name, git version, git SHA, build date, Go version, compiler, and `GOOS/GOARCH`. `main` calls it under the `--version` flag and exits before any other initialization runs, so the binary can be queried without contacting the API server.

Sources: [cmd/agent-sandbox-controller/main.go:50-107](), [internal/version/version.go:26-91](), [Dockerfile:1-37]()

## Flag surface

All runtime configuration is exposed as `flag` values registered on the default command line. Each flag has a single owner; nothing reads environment variables directly in this file. The Helm chart renders the same names into the container `args` block, defined by `helm/templates/_controller-args.tpl` and consumed by `helm/templates/deployment.yaml` via the `agent-sandbox.controllerArgs` template.

| Flag | Default | Owner / Effect |
|------|---------|----------------|
| `--version` | `false` | Prints `version.Print(...)` and exits |
| `--cluster-domain` | `cluster.local` | Passed to `SandboxReconciler.ClusterDomain` for FQDN generation |
| `--metrics-bind-address` | `:8080` | Bind for controller-runtime metrics server (and pprof when enabled) |
| `--health-probe-bind-address` | `:8081` | Bind for `/healthz` and `/readyz` |
| `--leader-elect` | `true` | Toggles controller-runtime leader election |
| `--leader-election-namespace` | `""` | When empty with leader-elect on, falls back to controller-runtime auto-detection |
| `--extensions` | `false` | Registers the extensions scheme and three extra reconcilers |
| `--enable-tracing` | `false` | Initializes OTel via `asmetrics.SetupOTel` |
| `--enable-pprof` | `false` | Exposes only `/debug/pprof/profile` on the metrics server |
| `--enable-pprof-debug` | `false` | Exposes the full pprof index plus `fgprof`; implies `--enable-pprof` |
| `--pprof-block-profile-rate` | `1000000` | `runtime.SetBlockProfileRate` value when pprof debug is on |
| `--pprof-mutex-profile-fraction` | `10` | `runtime.SetMutexProfileFraction` value when pprof debug is on |
| `--kube-api-qps` | `-1.0` | Set on `restConfig.QPS`; `-1` keeps the client unlimited |
| `--kube-api-burst` | `10` | Set on `restConfig.Burst`; validated `> 0` |
| `--sandbox-concurrent-workers` | `1` | `MaxConcurrentReconciles` for the core Sandbox controller |
| `--sandbox-claim-concurrent-workers` | `1` | `MaxConcurrentReconciles` for `SandboxClaim` (extensions only) |
| `--sandbox-warm-pool-concurrent-workers` | `1` | `MaxConcurrentReconciles` for `SandboxWarmPool` (extensions only) |
| `--sandbox-template-concurrent-workers` | `1` | `MaxConcurrentReconciles` for `SandboxTemplate` (extensions only) |
| `--sandbox-warm-pool-max-batch-size` | `300` | `SandboxWarmPoolReconciler.MaxBatchSize` for parallel create/delete |

After `flag.Parse()`, `main` performs early validation: concurrency values must be positive, `--kube-api-burst` must be positive, the warm-pool batch size must be positive, and the sum of all worker counts is logged as a warning if it exceeds `1000` (or exceeds `kube-api-burst` when QPS is set).

Sources: [cmd/agent-sandbox-controller/main.go:51-149](), [helm/templates/_controller-args.tpl:1-50]()

## Scheme assembly

The `Scheme` consumed by the manager is `controllers.Scheme`, which is built in a package-level `init()` and already contains the core Kubernetes client-go types and the sandbox v1beta1 group:

```go
// controllers/sandbox_controller.go
var Scheme = runtime.NewScheme()

func init() {
    utilruntime.Must(clientgoscheme.AddToScheme(Scheme))
    utilruntime.Must(sandboxv1beta1.AddToScheme(Scheme))
}
```

`main` then conditionally extends it with the extensions group only when `--extensions` is set:

```go
scheme := controllers.Scheme
if extensions {
    utilruntime.Must(extensionsv1beta1.AddToScheme(scheme))
}
```

The extensions group is `extensions.agents.x-k8s.io/v1beta1`, declared in `extensions/api/v1beta1/groupversion_info.go`. Skipping that registration when extensions are off keeps the manager's cache and RBAC surface limited to the core `Sandbox` CRD.

Sources: [cmd/agent-sandbox-controller/main.go:174-177](), [controllers/sandbox_controller.go:112-120](), [extensions/api/v1beta1/groupversion_info.go:25-36]()

## Wiring diagram

```mermaid
flowchart TB
    subgraph CLI["cmd/agent-sandbox-controller/main.go"]
        Flags["flag.Parse()<br/>--leader-elect, --extensions,<br/>--enable-tracing, --enable-pprof,<br/>--kube-api-*, --sandbox-*-workers"]
        Logger["zap.New + ctrl.SetLogger"]
        SigCtx["ctrl.SetupSignalHandler()"]
        OTel["asmetrics.SetupOTel<br/>(10s init timeout)"]
        Mux["http.DefaultServeMux =<br/>http.NewServeMux()"]
        Mgr["ctrl.NewManager(restConfig, Options{...})<br/>LeaderElectionID a3317529.agent-sandbox.x-k8s.io"]
    end

    subgraph Scheme["Scheme (runtime.NewScheme)"]
        Core["clientgoscheme +<br/>sandboxv1beta1<br/>(controllers.init())"]
        Ext["extensionsv1beta1<br/>(only if --extensions)"]
    end

    subgraph Servers["Manager-owned servers"]
        Metrics["metricsserver.Options<br/>BindAddress :8080<br/>+ ExtraHandlers (pprof)"]
        Probe["HealthProbeBindAddress :8081<br/>healthz.Ping /healthz, /readyz"]
    end

    subgraph Recs["Reconcilers"]
        Core1["controllers.SandboxReconciler<br/>--sandbox-concurrent-workers"]
        Claim["extensionscontrollers.SandboxClaimReconciler<br/>+ queue.SimpleSandboxQueue"]
        Tmpl["extensionscontrollers.SandboxTemplateReconciler"]
        Warm["extensionscontrollers.SandboxWarmPoolReconciler<br/>MaxBatchSize"]
    end

    SandColl["asmetrics.RegisterSandboxCollector(mgr.GetClient())"]

    Flags --> Logger --> SigCtx --> OTel --> Mux --> Mgr
    Core --> Mgr
    Ext --> Mgr
    Mgr --> Metrics
    Mgr --> Probe
    Mgr --> SandColl
    Mgr --> Core1
    Mgr -.->|--extensions| Claim
    Mgr -.->|--extensions| Tmpl
    Mgr -.->|--extensions| Warm
    SigCtx -->|ctx| Mgr
```

Sources: [cmd/agent-sandbox-controller/main.go:150-294]()

## Logger, signal context, and tracing initialization

The zap options struct is bound to the command line so `--zap-*` flags work, then `ctrl.SetLogger` installs the resulting logger globally:

```go
opts := zap.Options{Development: false}
opts.BindFlags(flag.CommandLine)
flag.Parse()
...
ctrl.SetLogger(zap.New(zap.UseFlagOptions(&opts)))
```

`ctrl.SetupSignalHandler()` returns the parent `context.Context` used for everything downstream (`SetupOTel`, leader election, reconcilers, `mgr.Start`). When SIGTERM or SIGINT arrives, this context is cancelled and the manager exits cleanly.

Tracing is opt-in. If `--enable-tracing` is set, `main` creates a 10-second timeout child context purely for the OTel bootstrap, calls `asmetrics.SetupOTel(initCtx, "agent-sandbox-controller")`, and defers the returned `cleanup`. Otherwise the program uses `asmetrics.NewNoOp()`. The resulting `instrumenter` is the same `Tracer` field handed to both the core `SandboxReconciler` and the extensions' `SandboxClaim` and `SandboxTemplate` reconcilers.

Sources: [cmd/agent-sandbox-controller/main.go:98-168](), [internal/metrics/tracing.go:49](), [internal/metrics/tracing.go:124]()

## Metrics server, pprof, and the default ServeMux defense

Before configuring the metrics server, `main` explicitly resets the process-wide HTTP mux:

```go
// Importing net/http/pprof registers handlers on the global DefaultServeMux.
// Reset it to avoid accidentally exposing pprof via any server that uses the default mux.
http.DefaultServeMux = http.NewServeMux()
```

This protects against the transitive import of `net/http/pprof` (only used via its individual handler functions here) silently exposing pprof on any future server that uses `http.DefaultServeMux`. All pprof exposure is then explicit and routed through the metrics server's `ExtraHandlers`:

| Flag combination | Endpoints mounted on `metrics-bind-address` |
|------------------|---------------------------------------------|
| Neither flag | `/metrics` only |
| `--enable-pprof` | `/metrics`, `/debug/pprof/profile` |
| `--enable-pprof-debug` | All of the above plus `/debug/pprof/` index, `cmdline`, `symbol`, `heap`, `goroutine`, `allocs`, `block`, `mutex`, `trace`, and `/debug/fgprof` |

When `--enable-pprof-debug` is active, `main` also clamps any negative sampling values to zero and applies `runtime.SetBlockProfileRate` and `runtime.SetMutexProfileFraction`. The setup log explicitly warns that the debug surface may expose sensitive information.

Sources: [cmd/agent-sandbox-controller/main.go:170-214]()

## Manager construction and leader election

The REST config is taken from in-cluster or kubeconfig discovery and overridden with the QPS/burst flags before constructing the manager:

```go
restConfig := ctrl.GetConfigOrDie()
restConfig.QPS   = float32(kubeAPIQPS)
restConfig.Burst = kubeAPIBurst

mgr, err := ctrl.NewManager(restConfig, ctrl.Options{
    Scheme:                  scheme,
    Metrics:                 metricsOpts,
    HealthProbeBindAddress:  probeAddr,
    LeaderElection:          enableLeaderElection,
    LeaderElectionNamespace: leaderElectionNamespace,
    LeaderElectionID:        "a3317529.agent-sandbox.x-k8s.io",
})
```

The lease name `a3317529.agent-sandbox.x-k8s.io` is the stable identifier under which only one replica becomes the active reconciler. When `--leader-elect=true` and `--leader-election-namespace=""`, the setup log records that auto-detection of the namespace is being attempted by controller-runtime.

Immediately after the manager exists, `asmetrics.RegisterSandboxCollector(mgr.GetClient(), …)` attaches a custom Prometheus collector backed by the manager's cached client; it is registered globally and surfaced on the same `/metrics` endpoint as the controller-runtime metrics.

Sources: [cmd/agent-sandbox-controller/main.go:216-234](), [internal/metrics/sandbox_collector.go:62]()

## Reconciler registration

The core reconciler is always registered:

```go
if err = (&controllers.SandboxReconciler{
    Client:        mgr.GetClient(),
    Scheme:        mgr.GetScheme(),
    Tracer:        instrumenter,
    ClusterDomain: clusterDomain,
}).SetupWithManager(mgr, sandboxConcurrentWorkers); err != nil {
    setupLog.Error(err, "unable to create controller", "controller", "Sandbox")
    os.Exit(1)
}
```

When `--extensions` is set, three more are registered, sharing a single in-memory `SimpleSandboxQueue` that the claim controller uses to track warm sandboxes:

| Reconciler (extensions only) | Concurrency flag | Notable fields |
|------------------------------|------------------|----------------|
| `extensionscontrollers.SandboxClaimReconciler` | `--sandbox-claim-concurrent-workers` | `WarmSandboxQueue: queue.NewSimpleSandboxQueue()`, `Recorder: mgr.GetEventRecorder("sandboxclaim-controller")`, `Tracer: instrumenter` |
| `extensionscontrollers.SandboxTemplateReconciler` | `--sandbox-template-concurrent-workers` | `Recorder: mgr.GetEventRecorder("sandboxtemplate-controller")`, `Tracer: instrumenter` |
| `extensionscontrollers.SandboxWarmPoolReconciler` | `--sandbox-warm-pool-concurrent-workers` | `MaxBatchSize: sandboxWarmPoolMaxBatchSize` |

Each reconciler's own `SetupWithManager(mgr, concurrentWorkers)` is what actually registers watches and applies the worker count; `main.go` is only responsible for instantiation and ordering. Errors at any step cause `os.Exit(1)`.

Sources: [cmd/agent-sandbox-controller/main.go:236-277](), [extensions/controllers/sandboxclaim_controller.go:1269-1273](), [extensions/controllers/sandboxtemplate_controller.go:215-218](), [extensions/controllers/sandboxwarmpool_controller.go:533-537]()

## Health probes and main loop

After reconciler wiring, `main` attaches two trivial liveness/readiness probes and starts the manager:

```go
if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil { ... }
if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil { ... }

setupLog.Info("starting manager")
if err := mgr.Start(ctx); err != nil { ... }
```

`mgr.Start(ctx)` blocks until the signal context is cancelled, at which point the manager runs internal shutdown and the deferred OTel `cleanup` fires. The probe server listens on `HealthProbeBindAddress` (`:8081` by default), distinct from the metrics server on `:8080`.

The Helm deployment matches these defaults exactly:

```yaml
ports:
- name: metrics
  containerPort: 8080
- name: healthz
  containerPort: 8081
livenessProbe:
  httpGet: { path: /healthz, port: healthz }
readinessProbe:
  httpGet: { path: /readyz,  port: healthz }
```

Renaming or moving either port in `main.go` would require a matching change in `helm/templates/deployment.yaml`, since the probes are wired to the `healthz` named port and Prometheus scrape config typically targets the `metrics` port.

Sources: [cmd/agent-sandbox-controller/main.go:281-294](), [helm/templates/deployment.yaml:30-48]()

## Operational summary

The entry point is intentionally small and procedural: parse flags, build a logger and signal context, optionally initialize tracing, decide which CRD groups go into the scheme, configure the metrics/pprof and probe endpoints, build the manager with a fixed leader-election ID, instantiate the reconcilers (always Sandbox, optionally the three extensions reconcilers behind `--extensions`), register the Prometheus collector and health probes, and block on `mgr.Start`. Build-time provenance comes from `internal/version` symbols injected by the Dockerfile's `-ldflags`, and the Helm chart's controller args template is the canonical mapping from the flags documented above to what a deployed pod actually runs.
