Agent-readable wiki
actiond Developer Reference Wiki
actiond is a local Remote Execution API (REAPI) worker and cache for running Bazel actions in a hermetic Linux sandbox, supporting both direct Linux host execution via chroot/namespaces and macOS-hosted Linux VM execution via Apple's Virtualization.framework.
Pages
- Technical OrientationWhat actiond is, its two execution modes (linux-actiond and darwin-actiond serve-vm), the top-level binary layout, the REAPI subset it implements, and how the rest of this reference is organized.
- REAPI Server, CAS & Data FlowThe HTTP/2 gRPC server that implements the REAPI surface (Execution, CAS, ByteStream, ActionCache, Capabilities), how blobs are stored and addressed in the content-addressable store, and the end-to-end data flow from Bazel client through to action output collection.
- Linux Host Execution — Chroot, Namespaces & CgroupsHow linux-actiond runs actions directly on a Linux host: execroot construction, read-only bind mounts for CAS inputs, private mount and network namespace setup, loopback-only networking, uid/gid drop, PR_SET_NO_NEW_PRIVS, and best-effort cgroup v2 resource limits.
- macOS VM Execution — darwin-actiond, vsock & Guest WorkerHow darwin-actiond serve-vm boots a minimal arm64 Linux VM via Virtualization.framework, proxies REAPI traffic over virtio-vsock to linux-actiond-guest, manages the guest-owned ext4 CAS on a virtio block device, and handles standalone binary payload extraction at startup.
- actiondfs — Lazy CAS-Backed Input FilesystemThe custom Linux kernel filesystem module that exposes REAPI input trees to VM actions without per-file copies: lazy Directory proto resolution from the guest CAS, VM-lifetime parsed Directory cache keyed by digest, backing-file delegation for read/splice/mmap, strict vs. overlayfs compatibility paths for input-mutating actions, and the /proc/actiondfs_stats counter interface.
- Build System, Runtime Images & TestingHow the full repo is built with Bazel and rules_zig, how glibc runtime SquashFS images are packaged and selected via the libc platform property, how standalone binaries embed compressed kernel and initramfs payloads, and the layered testing strategy: unit tests, Docker Linux e2e, VM e2e, and the LLVM tblgen smoke benchmark.
Complete Markdown
# actiond Developer Reference Wiki
> actiond is a local Remote Execution API (REAPI) worker and cache for running Bazel actions in a hermetic Linux sandbox, supporting both direct Linux host execution via chroot/namespaces and macOS-hosted Linux VM execution via Apple's Virtualization.framework.
## Context Links
- [Agent index](https://grok-wiki.com/public/wiki/hermeticbuild-actiond-796c0ee40e63/llms.txt)
- [Human interactive wiki](https://grok-wiki.com/public/wiki/hermeticbuild-actiond-796c0ee40e63)
- [GitHub repository](https://github.com/hermeticbuild/actiond)
## Repository Metadata
- Repository: hermeticbuild/actiond
- Generated: 2026-05-25T17:47:35.912Z
- Updated: 2026-05-25T18:16:09.115Z
- Runtime: Pi · Claude Code · claude-sonnet-4-6
- Format: Technical
- Pages: 6
## Page Index
- 01. [Technical Orientation](https://grok-wiki.com/public/wiki/hermeticbuild-actiond-796c0ee40e63/pages/01-technical-orientation.md) - What actiond is, its two execution modes (linux-actiond and darwin-actiond serve-vm), the top-level binary layout, the REAPI subset it implements, and how the rest of this reference is organized.
- 02. [REAPI Server, CAS & Data Flow](https://grok-wiki.com/public/wiki/hermeticbuild-actiond-796c0ee40e63/pages/02-reapi-server-cas-data-flow.md) - The HTTP/2 gRPC server that implements the REAPI surface (Execution, CAS, ByteStream, ActionCache, Capabilities), how blobs are stored and addressed in the content-addressable store, and the end-to-end data flow from Bazel client through to action output collection.
- 03. [Linux Host Execution — Chroot, Namespaces & Cgroups](https://grok-wiki.com/public/wiki/hermeticbuild-actiond-796c0ee40e63/pages/03-linux-host-execution-chroot-namespaces-cgroups.md) - How linux-actiond runs actions directly on a Linux host: execroot construction, read-only bind mounts for CAS inputs, private mount and network namespace setup, loopback-only networking, uid/gid drop, PR_SET_NO_NEW_PRIVS, and best-effort cgroup v2 resource limits.
- 04. [macOS VM Execution — darwin-actiond, vsock & Guest Worker](https://grok-wiki.com/public/wiki/hermeticbuild-actiond-796c0ee40e63/pages/04-macos-vm-execution-darwin-actiond-vsock-guest-worker.md) - How darwin-actiond serve-vm boots a minimal arm64 Linux VM via Virtualization.framework, proxies REAPI traffic over virtio-vsock to linux-actiond-guest, manages the guest-owned ext4 CAS on a virtio block device, and handles standalone binary payload extraction at startup.
- 05. [actiondfs — Lazy CAS-Backed Input Filesystem](https://grok-wiki.com/public/wiki/hermeticbuild-actiond-796c0ee40e63/pages/05-actiondfs-lazy-cas-backed-input-filesystem.md) - The custom Linux kernel filesystem module that exposes REAPI input trees to VM actions without per-file copies: lazy Directory proto resolution from the guest CAS, VM-lifetime parsed Directory cache keyed by digest, backing-file delegation for read/splice/mmap, strict vs. overlayfs compatibility paths for input-mutating actions, and the /proc/actiondfs_stats counter interface.
- 06. [Build System, Runtime Images & Testing](https://grok-wiki.com/public/wiki/hermeticbuild-actiond-796c0ee40e63/pages/06-build-system-runtime-images-testing.md) - How the full repo is built with Bazel and rules_zig, how glibc runtime SquashFS images are packaged and selected via the libc platform property, how standalone binaries embed compressed kernel and initramfs payloads, and the layered testing strategy: unit tests, Docker Linux e2e, VM e2e, and the LLVM tblgen smoke benchmark.
## Source File Index
- `ARCHITECTURE.md`
- `BUILD.bazel`
- `cmd/darwin_actiond/`
- `cmd/linux_actiond_guest/`
- `cmd/linux_actiond/`
- `e2e/llvm_tblgen_smoke.sh`
- `e2e/run_llvm_vm_smoke.sh`
- `kernel/actiondfs/actiondfs.c`
- `kernel/actiondfs/BUILD.bazel`
- `kernel/actiondfs/Kconfig`
- `kernel/actiondfs/Makefile`
- `MODULE.bazel`
- `README.md`
- `runtimes/BUILD.bazel`
- `runtimes/glibc_runtime_repo.bzl`
- `src/action_cache_service.zig`
- `src/action_executor.zig`
- `src/action_runner.zig`
- `src/body_sink.zig`
- `src/bytestream_service.zig`
- `src/cache_service.zig`
- `src/capabilities_service.zig`
- `src/cas.zig`
- `src/control_protocol.zig`
- `src/darwin_vm_host.zig`
- `src/darwin_vm.zig`
- `src/embedded_payload.zig`
- `src/execroot.zig`
- `src/execution_service.zig`
- `src/grpc_http2_server.zig`
- `src/grpc_vsock_bridge.zig`
- `src/guest_init.zig`
- `src/guest_worker.zig`
- `src/reapi_dispatch.zig`
- `src/reapi.zig`
- `src/root.zig`
- `src/runtime_mount.zig`
- `src/staged_cas_index.zig`
- `src/vsock.zig`
- `tools/e2e.sh`
- `vm/BUILD.bazel`
---
## 01. Technical Orientation
> What actiond is, its two execution modes (linux-actiond and darwin-actiond serve-vm), the top-level binary layout, the REAPI subset it implements, and how the rest of this reference is organized.
- Page Markdown: https://grok-wiki.com/public/wiki/hermeticbuild-actiond-796c0ee40e63/pages/01-technical-orientation.md
- Generated: 2026-05-25T17:41:53.639Z
### Source Files
- `README.md`
- `ARCHITECTURE.md`
- `MODULE.bazel`
- `src/root.zig`
- `cmd/darwin_actiond/`
- `cmd/linux_actiond/`
- `cmd/linux_actiond_guest/`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [README.md](README.md)
- [ARCHITECTURE.md](ARCHITECTURE.md)
- [MODULE.bazel](MODULE.bazel)
- [src/root.zig](src/root.zig)
- [src/reapi_dispatch.zig](src/reapi_dispatch.zig)
- [src/host_server.zig](src/host_server.zig)
- [src/darwin_vm_host.zig](src/darwin_vm_host.zig)
- [src/embedded_payload.zig](src/embedded_payload.zig)
- [src/guest_init.zig](src/guest_init.zig)
- [src/grpc_vsock_bridge.zig](src/grpc_vsock_bridge.zig)
- [cmd/darwin_actiond/main.zig](cmd/darwin_actiond/main.zig)
- [cmd/linux_actiond/main.zig](cmd/linux_actiond/main.zig)
- [cmd/linux_actiond_guest/main.zig](cmd/linux_actiond_guest/main.zig)
</details>
# Technical Orientation
`actiond` is a local [Remote Execution API (REAPI)](https://github.com/bazelbuild/remote-apis) worker and cache designed for running Bazel actions inside a hermetic Linux sandbox. Its primary purpose is to let a developer's workstation act as a local remote-execution worker without surrendering normal process access to the build tool.
This page describes what `actiond` is, the two execution backends it provides, how its binaries are laid out and deployed, which REAPI methods it implements, and how the major source modules relate to one another. Deeper per-subsystem detail (filesystem, CAS layout, execution lifecycle, performance) lives in [ARCHITECTURE.md](ARCHITECTURE.md) and in the subsystem-specific pages of this wiki.
---
## What actiond Is
`actiond` accepts standard Bazel REAPI traffic over gRPC/HTTP-2, stores CAS and ActionCache state locally, constructs a fresh Linux execroot with minimal data copies, runs each action under process-level isolation, and returns declared outputs back into the CAS. The shared goal is the same across both execution modes.
The project exists primarily because of the macOS path: Apple Silicon Macs cannot run Linux binaries natively, yet Bazel's default Mac sandbox does not provide a Linux filesystem view. `darwin-actiond serve-vm` fills that gap by booting a tiny Linux VM using Apple's `Virtualization.framework` and routing all execution into it, while the Mac host owns only the TCP listener and VM lifecycle.
Sources: [README.md](), [ARCHITECTURE.md]()
---
## Execution Modes
### `linux-actiond` — Direct Linux Host
`linux-actiond serve` runs actions directly on a Linux host. Isolation is achieved entirely through Linux kernel primitives applied to each action process:
- `chroot` into a per-action work root
- private mount and network namespaces
- loopback-only networking (`lo` only, no outbound interface)
- read-only bind mounts for CAS inputs
- dropped uid/gid, `PR_SET_NO_NEW_PRIVS`
- best-effort cgroup v2 limits
The runtime SquashFS is mounted locally and glibc runtime directories are bind-mounted into the chroot when the action requests a specific `libc` platform property.
Sources: [ARCHITECTURE.md — Sandbox Model: Linux Host](), [src/host_server.zig:68-115]()
### `darwin-actiond serve-vm` — macOS Virtualization.framework VM
`darwin-actiond serve-vm` starts a long-lived arm64 Linux VM and proxies all REAPI traffic into it over `virtio-vsock`. The VM is intentionally minimal:
| Component | Details |
|---|---|
| Kernel | Custom arm64 Linux `Image`, built by `linux.bzl` via Bazel |
| Initramfs | Contains only `linux-actiond-guest` |
| Writable block device (`virtio-blk`) | Guest ext4 disk image mounted at `/cas` — owns CAS, ActionCache, and staged outputs |
| Read-only block device | `runtimes.sqfs` mounted at `/runtimes` |
| Control channel | `virtio-vsock` — carries gRPC payloads and a control protocol |
| Networking | No guest network device; each action gets a Linux network namespace with loopback only |
The VM is started once and reused across all actions. Per-action isolation is Linux-native (chroot, mount namespace, network namespace) inside the guest — not per-action VM cold boots.
Sources: [ARCHITECTURE.md — VM Shape](), [src/guest_init.zig:18-50](), [src/grpc_vsock_bridge.zig:22-58]()
---
## Binary Layout
```text
┌─────────────────────────────────────────────────────────┐
│ darwin-actiond (macOS Mach-O arm64) │
│ cmd: serve-vm → darwin_vm_host.serve() │
│ embedded: __ACTIOND,__kernel (Image.zst) │
│ __ACTIOND,__initramfs (initramfs.cpio.zst) │
│ __ACTIOND,__runtimes (runtimes-aarch64.sqfs)│
└───────────────────┬─────────────────────────────────────┘
│ vsock gRPC bridge
▼
┌─────────────────────────────────────────────────────────┐
│ linux-actiond-guest (Linux ELF, inside initramfs) │
│ mode --guest-init → guest_init.run() (PID 1) │
│ mode --guest-worker → guest_worker.run() │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ linux-actiond (Linux ELF arm64 or x86_64) │
│ cmd: serve → host_server.serve() │
│ embedded: .actiond.runtimes (ELF section) │
└─────────────────────────────────────────────────────────┘
```
### `darwin-actiond`
Entry point: [`cmd/darwin_actiond/main.zig`](cmd/darwin_actiond/main.zig). The only supported subcommand is `serve-vm`. Argument parsing is handled by `darwin_vm_host.parseServeVmArgs`; the runtime call is `darwin_vm_host.serve`. The Mach-O binary carries compressed boot artifacts in dedicated sections; at startup these are extracted under `--root` and inflated before `VZLinuxBootLoader` is called.
Key CLI flags:
| Flag | Default | Purpose |
|---|---|---|
| `--listen` | `127.0.0.1:8980` | Public REAPI TCP address |
| `--root` | `/tmp/actiond-vm` | Worker root for extracted artifacts |
| `--cas-image` | `<root>/cas.ext4` | Writable guest ext4 disk image |
| `--kernel` | embedded | Path to raw or `.zst` kernel |
| `--initramfs` | embedded | Path to `.cpio.zst` initramfs |
| `--runtime-image` | embedded | Path to runtime SquashFS |
| `--memory-mib` | 512 | VM RAM |
| `--cpus` | 2 | VM vCPU count |
Sources: [cmd/darwin_actiond/main.zig](), [src/darwin_vm_host.zig:24-67]()
### `linux-actiond`
Entry point: [`cmd/linux_actiond/main.zig`](cmd/linux_actiond/main.zig). The only supported subcommand is `serve`. Calls `host_server.serve`. The standalone binary embeds the runtime SquashFS in the `.actiond.runtimes` ELF section, extracted automatically when neither `--runtime-image` nor `--runtime-root` is supplied.
Key CLI flags:
| Flag | Default | Purpose |
|---|---|---|
| `--listen` | `127.0.0.1:8980` | Public REAPI TCP address |
| `--root` | `/tmp/actiond` | Worker root (CAS, ActionCache, work dirs) |
| `--runtime-image` | embedded | Runtime SquashFS path |
| `--runtime-root` | — | Pre-mounted runtime directory |
Sources: [cmd/linux_actiond/main.zig](), [src/host_server.zig:19-55]()
### `linux-actiond-guest`
Entry point: [`cmd/linux_actiond_guest/main.zig`](cmd/linux_actiond_guest/main.zig). The same binary serves two roles inside the VM, selected by argv:
- If invoked as `init` (or with `--guest-init`): runs `guest_init.run()`, which acts as PID 1. It mounts `proc`, `sysfs`, `cgroup2`, `devtmpfs`, `tmpfs`, scans for virtio block devices to mount the CAS ext4 and runtime SquashFS, then `exec`s itself as `--guest-worker`.
- With `--guest-worker`: runs `guest_worker.run()`, the actual REAPI + execution server loop inside the guest.
Sources: [cmd/linux_actiond_guest/main.zig](), [src/guest_init.zig:47-60]()
---
## REAPI Subset Implemented
The server implements the exact methods Bazel needs for remote execution. The full set of dispatched method paths is declared as constants in [`src/reapi_dispatch.zig`](src/reapi_dispatch.zig):
| Service | Method | gRPC path |
|---|---|---|
| `Capabilities` | `GetCapabilities` | `/build.bazel.remote.execution.v2.Capabilities/GetCapabilities` |
| `ContentAddressableStorage` | `FindMissingBlobs` | `/build.bazel.remote.execution.v2.ContentAddressableStorage/FindMissingBlobs` |
| `ContentAddressableStorage` | `BatchUpdateBlobs` | `/build.bazel.remote.execution.v2.ContentAddressableStorage/BatchUpdateBlobs` |
| `ContentAddressableStorage` | `BatchReadBlobs` | `/build.bazel.remote.execution.v2.ContentAddressableStorage/BatchReadBlobs` |
| `ContentAddressableStorage` | `GetTree` | `/build.bazel.remote.execution.v2.ContentAddressableStorage/GetTree` |
| `ByteStream` | `Read` | `/google.bytestream.ByteStream/Read` |
| `ByteStream` | `Write` | `/google.bytestream.ByteStream/Write` |
| `ActionCache` | `GetActionResult` | `/build.bazel.remote.execution.v2.ActionCache/GetActionResult` |
| `ActionCache` | `UpdateActionResult` | `/build.bazel.remote.execution.v2.ActionCache/UpdateActionResult` |
| `Execution` | `Execute` (server-streaming) | `/build.bazel.remote.execution.v2.Execution/Execute` |
| *(internal)* | `CAS/DeleteBlobs` | `/actiond.internal.v1.CAS/DeleteBlobs` |
All methods run over an HTTP/2 server (`grpc_http2_server.zig`) that dispatches independent streams concurrently. Server-streaming responses use a `body_sink` so large payloads are chunked rather than buffered into a single response frame.
The server does **not** implement resource exhaustion throttling, inter-worker federation, or the optional `WaitExecution` / `CancelOperation` RPCs.
Sources: [src/reapi_dispatch.zig:25-45](), [ARCHITECTURE.md — REAPI Surface]()
---
## Source Module Map
The shared library lives under `src/` and is the single `actiond` Zig module consumed by all three binaries. `src/root.zig` re-exports every module, making them available as `actiond.<module>` in consumers.
```text
src/
├── root.zig # Library root – re-exports all modules
│
├── --- Transport / HTTP-2 ---
├── grpc_http2_server.zig # HTTP/2 accept loop, stream dispatch
├── http2_frame.zig # Frame read/write
├── http2_header.zig # Header encoding/decoding
├── http2_hpack.zig # HPACK compression
├── grpc_record.zig # gRPC length-prefix framing
├── body_sink.zig # Chunked response body abstraction
│
├── --- REAPI Dispatch ---
├── reapi_dispatch.zig # Method router → service handlers
├── reapi.zig # Protobuf message types (REAPI)
├── protobuf_wire.zig # Wire-format encoder/decoder
├── capabilities_service.zig
├── cache_service.zig # CAS FindMissing/BatchUpdate/BatchRead
├── bytestream_service.zig # ByteStream Read/Write
├── bytestream.zig
├── action_cache_service.zig # ActionCache Get/Update
├── action_cache.zig
├── execution_service.zig # Execute (server-streaming)
├── tree_service.zig # GetTree
│
├── --- Storage ---
├── cas.zig # Content-addressed blob store
├── staged_cas_index.zig # actiondfs staged output tracking
│
├── --- Execution ---
├── action_executor.zig # Per-action fork/exec, namespace setup
├── action_runner.zig # High-level action dispatch
├── execroot.zig # Execroot construction (bind mounts / overlayfs)
├── runtime_mount.zig # SquashFS runtime selection & bind-mount
│
├── --- VM Host (macOS) ---
├── darwin_vm_host.zig # serve-vm entry, VM lifecycle, vsock bridge setup
├── darwin_vm.zig # Virtualization.framework wrapper (Zig side)
├── grpc_vsock_bridge.zig # TCP ↔ vsock raw gRPC byte bridge
├── vsock.zig # vsock port constants, connect helpers
├── control_protocol.zig # darwin↔guest control messages
├── control_transport_fd.zig
│
├── --- VM Guest (Linux) ---
├── guest_init.zig # PID-1 / init: mount filesystems, exec worker
├── guest_worker.zig # Guest REAPI + execution server
│
├── --- Standalone Packaging ---
├── embedded_payload.zig # Mach-O section / ELF section payload extraction
│
└── version.zig # Zig and Bazel version strings
```
Sources: [src/root.zig]()
---
## Build System and Artifact Packaging
The repo is fully Bazelized. All Zig binaries are built by `rules_zig` (version `0.16.0`). The C/Objective-C bridge for `Virtualization.framework` (`vz_bridge.m`) is compiled via the `@llvm` toolchain. The VM kernel is produced by the `linux.bzl` ruleset from the Linux source archive declared in `MODULE.bazel`.
Key build targets:
```bash
# macOS standalone (embeds kernel + initramfs + runtimes)
bazel build //cmd/darwin_actiond:darwin-actiond-standalone
# Linux standalone (embeds runtimes SquashFS)
bazel build //cmd/linux_actiond:linux-actiond-standalone \
--platforms=//platforms:linux_aarch64
# VM artifacts only
bazel build //vm:linux_kernel_zst //vm:initramfs //runtimes:runtimes_squashfs
```
The Darwin standalone uses Mach-O section embedding (`__ACTIOND,__kernel`, `__ACTIOND,__initramfs`, `__ACTIOND,__runtimes`). The Linux standalone uses ELF section embedding (`.actiond.runtimes`). Both use `embedded_payload.extractFromSelf` at runtime to locate and extract the payload.
The kernel is shipped compressed (`Image.zst`) and inflated once to `root/boot/kernel-<sha256>.Image` before `VZLinuxBootLoader` is invoked, because `VZLinuxBootLoader` requires a raw arm64 `Image` and the arm64 format is not self-decompressing.
Sources: [ARCHITECTURE.md — Build Artifacts, Why the Kernel Is Inflated Before Boot](), [src/embedded_payload.zig:8-18](), [MODULE.bazel:1-14]()
---
## VM Request Routing (Sequence Summary)
```mermaid
sequenceDiagram
participant Bazel
participant darwin-actiond (macOS)
participant grpc_vsock_bridge
participant linux-actiond-guest (VM)
participant guest ext4 CAS (/cas)
Bazel->>darwin-actiond (macOS): gRPC / REAPI (TCP 8980)
darwin-actiond (macOS)->>grpc_vsock_bridge: forward raw gRPC bytes
grpc_vsock_bridge->>linux-actiond-guest (VM): virtio-vsock (grpc_port)
linux-actiond-guest (VM)->>guest ext4 CAS (/cas): CAS / AC reads & writes
linux-actiond-guest (VM)-->>darwin-actiond (macOS): ExecuteResponse / CAS responses
darwin-actiond (macOS)-->>Bazel: gRPC response
```
The macOS host holds the public TCP listener and VM lifecycle but does not hold a CAS mirror. All content-addressed storage, ActionCache, and output promotion happen inside the guest against its native ext4 filesystem on the virtio block device.
Sources: [ARCHITECTURE.md — Darwin VM Request Routing](), [src/grpc_vsock_bridge.zig:34-58]()
---
## How This Reference Is Organized
| Page | Contents |
|---|---|
| **Technical Orientation** *(this page)* | What actiond is, execution modes, binary layout, REAPI surface, module map |
| **CAS and Storage** | Content-addressed blob layout, materialization strategies, actiondfs lazy filesystem (VM), output staging and promotion |
| **Execution Lifecycle** | Per-action fork/exec sequence, namespace setup, cgroup limits, runtime selection, input/output handling |
| **VM Infrastructure** | Virtualization.framework integration, kernel inflation, vsock protocol, guest init, disk image management |
| **Build and Packaging** | Bazel targets, standalone embedding, kernel build via linux.bzl, runtime SquashFS construction |
| **Testing** | Unit tests, e2e harness, LLVM smoke test, stress workspace |
---
## Summary
`actiond` is a focused REAPI worker implemented in Zig 0.16 that provides hermetic Linux action execution either directly on a Linux host (`linux-actiond serve`) or through an Apple Virtualization.framework VM on macOS (`darwin-actiond serve-vm`). The three binaries — `darwin-actiond`, `linux-actiond`, and `linux-actiond-guest` — share a single Zig library under `src/` and implement the ten REAPI methods Bazel needs for remote execution. Standalone binaries embed their runtime artifacts (kernel, initramfs, SquashFS) in native executable sections and extract them once at worker startup, keeping distribution and startup self-contained. The VM path is the project's primary motivation: it gives Apple Silicon Mac workstations a local Linux remote-execution worker without exposing host macOS resources to build actions.
---
## 02. REAPI Server, CAS & Data Flow
> The HTTP/2 gRPC server that implements the REAPI surface (Execution, CAS, ByteStream, ActionCache, Capabilities), how blobs are stored and addressed in the content-addressable store, and the end-to-end data flow from Bazel client through to action output collection.
- Page Markdown: https://grok-wiki.com/public/wiki/hermeticbuild-actiond-796c0ee40e63/pages/02-reapi-server-cas-data-flow.md
- Generated: 2026-05-25T17:44:59.489Z
### Source Files
- `src/grpc_http2_server.zig`
- `src/reapi.zig`
- `src/reapi_dispatch.zig`
- `src/execution_service.zig`
- `src/cache_service.zig`
- `src/bytestream_service.zig`
- `src/action_cache_service.zig`
- `src/capabilities_service.zig`
- `src/cas.zig`
- `src/body_sink.zig`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [src/grpc_http2_server.zig](src/grpc_http2_server.zig)
- [src/reapi.zig](src/reapi.zig)
- [src/reapi_dispatch.zig](src/reapi_dispatch.zig)
- [src/execution_service.zig](src/execution_service.zig)
- [src/cache_service.zig](src/cache_service.zig)
- [src/bytestream_service.zig](src/bytestream_service.zig)
- [src/action_cache_service.zig](src/action_cache_service.zig)
- [src/capabilities_service.zig](src/capabilities_service.zig)
- [src/cas.zig](src/cas.zig)
- [src/body_sink.zig](src/body_sink.zig)
- [src/action_cache.zig](src/action_cache.zig)
</details>
# REAPI Server, CAS & Data Flow
This page describes how actiond implements the [Remote Execution API (REAPI)](https://github.com/bazelbuild/remote-apis): the native HTTP/2 gRPC server that listens for Bazel client connections, the content-addressable store (CAS) that persists blobs and directory trees, and the end-to-end data path from an incoming `Execute` request through action execution and result collection.
The entire stack is written in Zig without generated protobuf stubs or a third-party gRPC runtime. Every layer—HTTP/2 framing, HPACK header compression, protobuf encoding/decoding, CAS storage, and action dispatch—is implemented natively.
---
## HTTP/2 gRPC Server
### Connection lifecycle
`grpc_http2_server.serve` binds a TCP address and enters an accept loop. Each accepted connection gets its own OS thread via `std.Thread.spawn` + `thread.detach()`. The thread validates the HTTP/2 client connection preface, exchanges `SETTINGS` frames, and then processes frames in a read loop.
```zig
// src/grpc_http2_server.zig
pub fn serveDispatcher(...) !void {
ignoreSigpipe();
const address = try std.Io.net.IpAddress.parseLiteral(config.listen);
var listener = try address.listen(io, .{ .reuse_address = true });
while (true) {
const stream = listener.accept(io) catch ...;
const thread = std.Thread.spawn(.{}, connectionThread, .{ io, allocator, dispatcher, stream }) ...;
thread.detach();
}
}
```
The server advertises the following settings to each peer: push disabled, 128 max concurrent streams, a 1 GiB inbound initial window, and a 1 MiB max frame size.
Sources: [src/grpc_http2_server.zig:388-420]()
### Frame processing and stream state
Each connection maintains a list of `StreamState` objects keyed by HTTP/2 stream ID. HEADERS frames are decoded via an HPACK decoder (`http2_hpack.Decoder`); the `:path` pseudo-header becomes the gRPC method name stored on the stream. DATA frames accumulate the gRPC request body. When END_STREAM is received the stream is removed from the list and a `ResponseTask` is spawned on a new thread—enabling genuine concurrent responses for multiple in-flight streams on the same connection.
```text
Connection read loop
├── SETTINGS → ack + apply peer settings
├── PING → ack
├── WINDOW_UPDATE → update flow-control windows
├── HEADERS → HPACK decode, extract :path, detect END_STREAM
├── CONTINUATION → append header block fragment
├── DATA → feed body; if client-streaming, pass directly to WriteGrpcStream
├── RST_STREAM → discard stream
└── GOAWAY → close connection
```
A `ResponseTracker` (atomic counter) limits the connection to `default_max_concurrent_streams` (128) active response threads at once, spinning until a slot is free.
Sources: [src/grpc_http2_server.zig:454-620]()
### Method routing and streaming kinds
`methodKind` classifies each gRPC method path into one of three patterns:
| Pattern | Methods |
|---|---|
| `unary` | `GetCapabilities`, `FindMissingBlobs`, `BatchUpdateBlobs`, `BatchReadBlobs`, `GetActionResult`, `UpdateActionResult` |
| `server_streaming` | `ByteStream/Read`, `CAS/GetTree`, `Execution/Execute` |
| `client_streaming` | `ByteStream/Write` |
For `server_streaming` methods the server opens the response DATA channel immediately (sends `:status 200` headers) and pushes records via a `body_sink.Writer` as they are produced. For `client_streaming` methods (`ByteStream/Write`) the server eagerly creates a `ByteStreamWriteClientStream` once the method is known and pipes each incoming DATA frame to `WriteGrpcStream.append` without buffering the entire upload in memory.
Sources: [src/grpc_http2_server.zig:712-734](), [src/grpc_http2_server.zig:150-180]()
### Flow control
`FlowControl` tracks per-stream and connection-level send windows using a spin-locked list. `reserveData` blocks the sender thread (spin/sleep) until the peer's window allows the next chunk. When DATA frames arrive the server sends matching `WINDOW_UPDATE` frames for both the connection and the stream to keep the peer's send window full.
Sources: [src/grpc_http2_server.zig:237-310]()
---
## REAPI Dispatch
`reapi_dispatch.Server` is the single dispatch object passed through the gRPC layer. It holds references to the CAS store, the optional action cache store, an optional work-root directory for execution, and `ExecuteOptions`.
```zig
pub const Server = struct {
store: cas.Store,
action_cache_store: ?action_cache.Store = null,
cleanup_store: ?cas.Store = null,
cleanup_visible_store: ?cas.Store = null,
cleanup_staged_index: ?*staged_cas_index.Index = null,
work_root: ?std.Io.Dir = null,
execution_options: action_executor.ExecuteOptions = .{},
};
```
`handleUnary`, `handleServerStreaming`, `handleServerStreamingResponse`, and `handleClientStreaming` each decode the gRPC-framed protobuf payload, dispatch to the appropriate service function, and encode the response back into a gRPC record.
### Method constants
Method path strings are declared as constants in `reapi_dispatch.zig` and used for string comparison at dispatch time. There are no reflection or codegen steps.
```zig
pub const cas_find_missing_blobs = "/build.bazel.remote.execution.v2.ContentAddressableStorage/FindMissingBlobs";
pub const execution_execute = "/build.bazel.remote.execution.v2.Execution/Execute";
pub const bytestream_read = "/google.bytestream.ByteStream/Read";
pub const bytestream_write = "/google.bytestream.ByteStream/Write";
// ... etc.
pub const internal_cas_delete_blobs = "/actiond.internal.v1.CAS/DeleteBlobs";
```
The `internal_cas_delete_blobs` method is an actiond-specific extension not present in the upstream REAPI specification. It is used by the cleanup subsystem to evict staged blobs.
Sources: [src/reapi_dispatch.zig:24-42]()
---
## Content-Addressable Store (CAS)
### Digest
`cas.Digest` is a pair of a 32-byte SHA-256 hash and a `u64` byte count. It is the canonical identity for every blob. The empty digest (`Digest.fromBytes("")`) is treated as always present.
```zig
pub const Digest = struct {
hash: [32]u8,
size_bytes: u64,
...
pub fn fromBytes(bytes: []const u8) Digest { /* SHA-256 + len */ }
pub fn toReapi(self: Digest, hash_out: *[64]u8) reapi.Digest { ... }
pub fn fromReapi(value: reapi.Digest) !Digest { ... }
};
```
Sources: [src/cas.zig:55-100]()
### On-disk layout
Blobs are stored under a sharded two-level directory tree to keep individual directories manageable:
```
blobs/sha256/
2c/
2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
...
trees/sha256/
2c/
2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824/
(materialized directory tree)
```
The first two hex characters of the hash form the shard directory. Blob files are written with mode `0o444` (read-only to prevent accidental modification). Tree entries are directories containing the fully materialized file hierarchy.
Sources: [src/cas.zig:13-26](), [src/cas.zig:test "CAS subpaths use two-character digest prefix sharding"]()
### BlobWriter and atomic publish
`BlobWriter` is the write path for all incoming blobs:
1. On Linux, attempts to open an anonymous temporary file with `O_TMPFILE` inside `blobs/sha256/`. On other platforms or on failure, falls back to a named temp file (`blobs/sha256/.tmp-<id>`).
2. Data is streamed in via `writeAll`, updating a SHA-256 hasher and byte counter incrementally.
3. `finish` computes the final digest, validates it against an optional expected value, creates the shard directory if needed, and atomically publishes the file:
- **Linux anonymous path**: `linkat(AT_EMPTY_PATH, ...)` (or via `/proc/self/fd/...` fallback).
- **Named temp**: `renamePreserve` (a cross-device-safe rename that preserves the source on `PathAlreadyExists`).
If the target path already exists the write is silently dropped—deduplication is inherent.
Sources: [src/cas.zig:334-430]()
### putFile and putFilePromote
`putFile` copies an external file into the CAS by opening it and feeding bytes through a `BlobWriter`. `putFilePromote` (Linux only) attempts a faster path: hash the file in place, `fchmod` it to `0o444`, then `renamePreserve` it into the CAS. If the rename crosses a filesystem boundary (`CrossDevice`) or a permission error occurs it falls back to copy. Detailed counters are exposed via `PutFileStats` for observability.
Sources: [src/cas.zig:222-295]()
### materializeTree
`materializeTree` reconstructs a full directory hierarchy from a root `reapi.Directory` digest:
1. Reads the root `Directory` protobuf from the CAS.
2. For each `FileNode`: attempts `hardLink` from the blob path; copies if hard-linking fails.
3. For each `DirectoryNode`: recurses.
4. On Linux the entire tree is built in a temporary directory first, then `rename`d into place atomically. On cross-device mounts it builds directly at the final path.
Sources: [src/cas.zig:296-332]()
---
## Service Implementations
### Capabilities (`capabilities_service.zig`)
Returns a static `ServerCapabilities` message. No state is consulted:
| Field | Value |
|---|---|
| Digest functions | SHA-256 |
| Action cache update | enabled |
| Max batch size | 4 MiB |
| Execution enabled | true |
| API versions | 2.0 – 2.3 |
Sources: [src/capabilities_service.zig:3-16]()
### CAS Batch Operations (`cache_service.zig`)
- **`FindMissingBlobs`**: iterates the request digest list, calls `store.has`, returns digests for which the answer is false.
- **`BatchUpdateBlobs`**: for each item, validates the data against the declared digest via `cas.Digest.fromBytes`, then calls `store.putKnownBytes`.
- **`BatchReadBlobs`**: reads each blob with `store.readAlloc`; per-item status codes are returned in the response.
- **`deleteBlobs` / `deleteBlobsWhenVisible`**: internal cleanup helpers that call `store.deleteBlob` conditionally.
Sources: [src/cache_service.zig:1-80]()
### ByteStream Read (`bytestream_service.zig`)
`writeReadGrpcRecords` is the preferred server-streaming path. It opens the blob file with `store.openBlob`, optionally seeks to `read_offset`, and reads in 1 MiB chunks (`max_read_response_data_bytes`). Each chunk is wrapped in a `grpc_record` (5-byte length prefix + protobuf `ReadResponse`) and written immediately to the `body_sink.Writer`—which on the server becomes the live HTTP/2 DATA writer. This avoids loading large blobs entirely into memory.
Sources: [src/bytestream_service.zig:68-120]()
### ByteStream Write (`bytestream_service.zig`)
`WriteGrpcStream` implements the streaming upload path:
1. Incoming bytes (potentially split mid-record) are buffered in `pending`.
2. Complete gRPC records are parsed; the first record's `resource_name` is used to extract and validate the target digest via `bytestream.parseBlobResource`.
3. A `cas.BlobWriter` is opened lazily on the first record.
4. Each `WriteRequest.data` chunk is passed to `BlobWriter.writeAll` (hashing and writing simultaneously).
5. When `finish_write = true` is seen `WriteGrpcStream.finish` calls `BlobWriter.finish`, which validates the final hash and atomically publishes the file.
Timing is logged for large uploads (≥128 records, ≥1 MiB, or ≥64 KiB + slow).
Sources: [src/bytestream_service.zig:178-280]()
### ActionCache (`action_cache_service.zig`, `action_cache.zig`)
`action_cache.Store` maps action digest → serialized `reapi.ActionResult`. Keys are stored as files at `ac/sha256/XX/FULL_HASH` (same sharding scheme as the CAS). `updateActionResult` serializes the result via `reapi.encodeAlloc` and writes it atomically. `getActionResult` reads and deserializes it.
Sources: [src/action_cache.zig:7-60](), [src/action_cache_service.zig:12-35]()
### Execution (`execution_service.zig`)
The `execute` function orchestrates the full remote execution flow:
1. **Cache lookup**: if `!skip_cache_lookup` and an `action_cache_store` is configured, attempt `store.get` with the action digest. On a hit, return a `CompletedOperation` with `cached_result: true` immediately—no execution.
2. **do_not_cache check**: reads the `Action` proto from the CAS to check the `do_not_cache` flag before running.
3. **Work directory**: creates a unique subdirectory under `work_root` using `exec/<hash>-<size>-<monotonic_id>`.
4. **Execute**: calls `action_executor.executeActionWithOptions`, which materializes inputs, runs the command, and collects outputs.
5. **Store result**: if `!do_not_cache`, the `ActionResult` is persisted to the `action_cache_store`.
6. **Return**: always returns a completed `Operation` (the server never long-polls; `done = true` immediately). The `ExecuteResponse` is packed into a `google.protobuf.Any` inside the `Operation.response` field.
The work directory is deleted after execution regardless of outcome.
Sources: [src/execution_service.zig:30-80]()
---
## Protobuf Types (`reapi.zig`)
All REAPI protobuf messages are hand-coded Zig structs with `encode`/`decode` methods that call into `protobuf_wire.zig` directly. There is no generated code. Key types include:
| Type | Purpose |
|---|---|
| `Digest` | `{hash: []const u8, size_bytes: i64}` — wire representation |
| `Action` | `command_digest`, `input_root_digest`, `do_not_cache`, `platform` |
| `Command` | `arguments`, `environment_variables`, `output_paths`, `platform` |
| `Directory` | `files: []FileNode`, `directories: []DirectoryNode` (must be canonically sorted) |
| `ActionResult` | `output_files`, `output_directories`, `exit_code`, `stdout_digest`, `stderr_digest`, `execution_metadata` |
| `ExecuteResponse` | `result`, `cached_result`, `status` |
| `Operation` | `name`, `done`, `response: ?Any` |
`Directory.validateCanonical` is called on both encode and decode to enforce the REAPI requirement that entries are lexicographically sorted with no duplicates.
Sources: [src/reapi.zig:1-15](), [src/reapi.zig:398-440]()
---
## End-to-End Data Flow
The sequence below shows the full path for a Bazel `Execute` RPC when the action is not cached.
```mermaid
sequenceDiagram
participant Bazel as Bazel Client
participant H2 as grpc_http2_server
participant Dispatch as reapi_dispatch.Server
participant Exec as execution_service
participant AC as action_cache.Store
participant CAS as cas.Store
Bazel->>H2: HTTP/2 HEADERS (:path=/Execution/Execute)
Bazel->>H2: HTTP/2 DATA (gRPC ExecuteRequest) + END_STREAM
H2->>H2: Spawn ResponseTask thread
H2->>Dispatch: handleServerStreamingResponse(method, body, Http2BodyWriter)
Dispatch->>Exec: execute(request, blob_store, ac_store, work_root)
Exec->>AC: get(action_digest)
AC-->>Exec: FileNotFound (cache miss)
Exec->>CAS: readAlloc(action_digest) → Action proto
Exec->>CAS: materializeTree(input_root_digest) → work_dir/
Exec->>Exec: action_executor.executeActionWithOptions()
Exec->>CAS: putFilePromote(output file) → Digest
Exec->>AC: put(action_digest, ActionResult)
Exec-->>Dispatch: CompletedOperation{done:true}
Dispatch->>H2: encode Operation as gRPC record
H2->>Bazel: HTTP/2 DATA (gRPC Operation)
H2->>Bazel: HTTP/2 HEADERS (grpc-status: 0) + END_STREAM
```
For a `ByteStream/Write` upload the flow is:
```mermaid
sequenceDiagram
participant Bazel as Bazel Client
participant H2 as grpc_http2_server
participant WS as WriteGrpcStream
participant CAS as cas.Store / BlobWriter
Bazel->>H2: HTTP/2 HEADERS (:path=/ByteStream/Write)
loop DATA chunks
Bazel->>H2: HTTP/2 DATA (gRPC WriteRequest chunk)
H2->>WS: append(bytes)
WS->>CAS: BlobWriter.writeAll(data) [hash incrementally]
end
Bazel->>H2: HTTP/2 DATA + END_STREAM (finish_write=true)
H2->>WS: finish()
WS->>CAS: BlobWriter.finish() → rename/linkat into blobs/sha256/XX/HASH
WS-->>H2: WriteResponse{committed_size}
H2->>Bazel: HTTP/2 DATA (WriteResponse) + trailers
```
---
## Body Sink: Streaming Abstraction
`body_sink.Writer` is a trait-object (vtable pointer pair) that decouples service code from the HTTP/2 transport:
```zig
pub const Writer = struct {
ctx: *anyopaque,
write_all: *const fn (*anyopaque, std.Io, std.mem.Allocator, []const u8) anyerror!void,
...
};
```
- `ArrayListWriter`: accumulates bytes into a `std.ArrayListUnmanaged(u8)` (used in tests and buffered paths).
- `Http2BodyWriter` (defined in `grpc_http2_server.zig`): calls `SharedHttp2Writer.sendData`, which respects HTTP/2 flow control and writes DATA frames directly to the TCP connection.
The `handleServerStreamingResponse` dispatch path in `reapi_dispatch.Server` passes the `Http2BodyWriter` through to services like `bytestream_service.writeReadGrpcRecords` and `tree_service.writeGetTreeGrpcRecords`, enabling incremental streaming without buffering entire responses.
Sources: [src/body_sink.zig:1-40](), [src/grpc_http2_server.zig:541-570]()
---
## Summary
actiond's REAPI server is a self-contained HTTP/2 gRPC implementation in Zig. The server spawns one thread per TCP connection and one thread per in-flight gRPC stream, enabling concurrent handling of up to 128 simultaneous streams per connection. The CAS uses SHA-256 content addressing with a two-character shard prefix layout; blobs are published atomically via rename or `linkat` to provide deduplication and crash safety. Service dispatch is a simple string-comparison switch in `reapi_dispatch.Server`; each service module interacts directly with `cas.Store` or `action_cache.Store` without intermediate abstraction. The `Execute` path is synchronous and always returns a completed `Operation`—cached hits return immediately, live executions block until the action finishes and the result is stored.
Sources: [src/grpc_http2_server.zig:388-420](), [src/reapi_dispatch.zig:45-140](), [src/cas.zig:115-170](), [src/execution_service.zig:30-80]()
---
## 03. Linux Host Execution — Chroot, Namespaces & Cgroups
> How linux-actiond runs actions directly on a Linux host: execroot construction, read-only bind mounts for CAS inputs, private mount and network namespace setup, loopback-only networking, uid/gid drop, PR_SET_NO_NEW_PRIVS, and best-effort cgroup v2 resource limits.
- Page Markdown: https://grok-wiki.com/public/wiki/hermeticbuild-actiond-796c0ee40e63/pages/03-linux-host-execution-chroot-namespaces-cgroups.md
- Generated: 2026-05-25T17:44:44.556Z
### Source Files
- `src/action_executor.zig`
- `src/action_runner.zig`
- `src/execroot.zig`
- `src/runtime_mount.zig`
- `src/staged_cas_index.zig`
- `src/cas.zig`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [src/action_executor.zig](src/action_executor.zig)
- [src/action_runner.zig](src/action_runner.zig)
- [src/execroot.zig](src/execroot.zig)
- [src/runtime_mount.zig](src/runtime_mount.zig)
- [src/staged_cas_index.zig](src/staged_cas_index.zig)
</details>
# Linux Host Execution — Chroot, Namespaces & Cgroups
`linux-actiond` runs REAPI actions directly on a Linux host without a virtual machine layer. Each action executes inside a tight sandbox: a per-action chroot directory serves as the execution root, a private mount namespace and a dedicated network namespace isolate the action from the host and from other concurrent actions, all CAS inputs are exposed as read-only bind mounts so content is never copied unnecessarily, and a cgroup v2 leaf enforces resource limits that callers declare via REAPI platform properties. The sandbox user drops to UID/GID 65534 with all Linux capabilities cleared before `execve`.
This page covers the full lifecycle from input fetching through execroot construction, namespace setup, privilege drop, and cgroup management, as implemented in the `action_executor`, `action_runner`, `execroot`, and `runtime_mount` modules.
---
## Overview: Execution Path
```text
executeActionWithOptions (action_executor.zig)
│
├─ Fetch Action + Command protos from CAS
├─ collectInputs → flat list of Input{path, digest, is_executable}
├─ Materializer.materializeInputs
│ ├─ [no-chroot] copy blobs directly into work_root
│ └─ [chroot] create placeholder files + BindMount{source=CAS blob, target=workspace/path}
├─ prepareChrootBaseDirs → create tmp/, var/tmp/ inside work_root
├─ appendRuntimeMounts → bind-mount libc/etc from runtime_root or squashfs image
│
└─ runCommandWithOptions (action_runner.zig)
├─ Cgroup.create → /sys/fs/cgroup/actiond/action-N
├─ prepareChrootWritableDirs → fchown/fchmod for sandbox uid/gid
├─ forkAction
│ ├─ [child] setpgid, write cgroup.procs, PR_SET_NO_NEW_PRIVS
│ ├─ [child] unshare(NEWNS | NEWNET)
│ ├─ [child] childBringUpLoopback
│ ├─ [child] mount / MS_PRIVATE|MS_REC
│ ├─ [child] mount actiondfs / bind mounts (read-only)
│ ├─ [child] chroot + chdir
│ ├─ [child] dropPrivileges (setgroups/setresgid/setresuid/capset=0)
│ └─ [child] execve
└─ collectChildResult, poll stdout/stderr, waitpid
```
---
## Execroot Construction
### Work Directory Layout
`executeActionWithOptions` allocates a per-action `work_root` directory. When a runtime root is provided (`options.runtime_root_path != null`), the execroot lives at `work_root/workspace/` and is the chroot target. Without a runtime, the work root itself becomes the execroot and no chroot is used.
```
work_root/
workspace/ ← chroot target (= / inside sandbox)
<input files> ← placeholder empty files for bind-mounted inputs
tmp/ ← created by prepareChrootBaseDirs
var/tmp/ ← created by prepareChrootBaseDirs
lib/ ← bind-mounted from runtime_root/libc/<version>/<arch>/root/lib
lib64/ ← ...
usr/lib/ ← ...
etc/ ← bind-mounted from runtime_root/common/root/etc (or libc etc)
```
The constant `chroot_execroot_prefix = "/workspace/"` documents the in-sandbox path that maps to `work_root/workspace/`.
Sources: [src/action_executor.zig:28](), [src/action_executor.zig:1611-1614]()
### Input Materialization Modes
`execroot.Materializer.materializeInputs` operates in two modes controlled by whether a `chroot_root_path` is set:
| Mode | How CAS blobs appear in execroot |
|---|---|
| **No-chroot** | Blobs are copied byte-for-byte into the work directory via `store.copyToFile` |
| **Chroot (bind-mount)** | A zero-byte placeholder file is created at `workspace/<path>`, then a `BindMount{source: CAS blob path, target: workspace/<path>}` is produced for the kernel |
In chroot mode, the CAS blob itself (`/cas/blobs/sha256/XX/YYYY…`) becomes the bind mount source. This avoids a copy and makes every non-executable input immediately read-only inside the sandbox with no additional remount step.
**Executable inputs that are argv[0]** are an exception: they must be kernel-executable, so they are copied rather than bind-mounted (the kernel cannot exec a file through a read-only bind mount that has the `NOSUID` flag if the inode was never written with executable permission relative to the sandbox uid).
Tree artifact inputs (directory entries declared as `DirectoryInput`) are bound as whole directory trees: the CAS tree staging directory (`cas/trees/XX/YYYY…`) is bind-mounted directly onto `workspace/<path>`.
Sources: [src/execroot.zig:65-115](), [src/execroot.zig:145-175]()
---
## Namespace Setup
### Namespace Flags
`action_runner.zig` declares the namespace flags used for every action:
```zig
fn actionNamespaceFlags() usize {
const linux = std.os.linux;
return linux.CLONE.NEWNS | linux.CLONE.NEWNET;
}
```
`CLONE.NEWNS` gives the child process a private copy of the mount table, preventing any `mount` or `umount` from affecting the host. `CLONE.NEWNET` gives it a fresh network stack. No user namespace (`CLONE_NEWUSER`) or PID namespace is created.
Sources: [src/action_runner.zig:14-17](), [src/action_runner.zig:671-672]() (test assertion)
### Child Sandbox Steps (in order)
After `fork()`, the child follows a strict sequence before signalling readiness:
1. **`setpgid(0, 0)`** — Creates a new process group so `kill(-pid, SIGKILL)` terminates all descendants.
2. **Write `"0\n"` to `cgroup.procs`** — Moves the child into the action's cgroup leaf before any work begins.
3. **`prctl(PR_SET_NO_NEW_PRIVS, 1, …)`** — Permanently forbids the process and all descendants from gaining new privileges via `setuid`/`setgid` binaries or file capabilities, even after dropping to an unprivileged uid.
4. **`close_range` (UNSHARE flag)** — Closes all file descriptors above fd 3 (the setup signal pipe) using the kernel `close_range` syscall with `CLOSE_RANGE_UNSHARE`.
5. **`unshare(NEWNS | NEWNET)`** — Enters private mount and network namespaces.
6. **`childBringUpLoopback()`** — Opens an `AF_INET SOCK_DGRAM` socket, reads the `lo` interface flags via `SIOCGIFFLAGS`, sets `IFF_UP`, and writes back via `SIOCSIFFLAGS`. This is the only network interface available inside the sandbox.
7. **`mount(null, "/", null, MS_PRIVATE | MS_REC, 0)`** — Makes all existing mount points private so no shared subtrees can propagate into or out of the namespace.
8. **actiondfs mounts** — If the `actiondfs` kernel module is in use, mount the custom filesystem.
9. **Bind mounts (read-only)** — For each `BindMount`, `mount(source, target, null, MS_BIND, 0)` followed by `mount(null, target, null, MS_BIND | MS_REMOUNT | MS_RDONLY | MS_NOSUID | MS_NODEV, 0)`.
10. **`chroot(work_root)`** — Locks the sandbox into the execroot.
11. **`chdir(chroot_cwd)`** — Sets working directory to `command.working_directory` prefixed with `/workspace`, or `/workspace` itself if empty.
12. **`childDropPrivileges(uid=65534, gid=65534)`** — Calls `setgroups(0, …)`, `setresgid`, `setresuid`, then `capset` with all-zero effective/permitted/inheritable sets.
13. **Write `"1"` to setup pipe** — Parent reads this as the signal that setup succeeded.
14. **`execve`** — Runs the action command.
Sources: [src/action_runner.zig:484-536](), [src/action_runner.zig:560-600]()
### Read-Only Bind Mount Sequence
```zig
fn childBindMountReadOnly(mount: BindMount) void {
const linux = std.os.linux;
childSyscallName(linux.mount(mount.source.ptr, mount.target.ptr, null, linux.MS.BIND, 0), "mount_bind");
childSyscallName(linux.mount(
null,
mount.target.ptr,
null,
linux.MS.BIND | linux.MS.REMOUNT | linux.MS.RDONLY | linux.MS.NOSUID | linux.MS.NODEV,
0,
), "mount_bind_ro");
}
```
The two-step pattern (bind, then remount read-only) is required because the Linux kernel does not accept `MS_RDONLY` on the initial `MS_BIND` call.
Sources: [src/action_runner.zig:607-620]()
---
## Runtime Mounts (libc / etc)
When an action requires a specific glibc version (declared via the `libc` REAPI platform property), `action_executor` appends additional bind mounts from a pre-discovered `RuntimeMountCache`:
| Target inside sandbox | Sourced from |
|---|---|
| `/lib` | `runtime_root/libc/<version>/<arch>/root/lib` (or `usr/lib` fallback) |
| `/lib64` | `runtime_root/libc/<version>/<arch>/root/lib64` (or `usr/lib64` fallback) |
| `/usr/lib` | `runtime_root/libc/<version>/<arch>/root/usr/lib` |
| `/etc` | `runtime_root/libc/<version>/<arch>/root/etc` (overrides common) |
Supported `libc` property values: `glibc2.31`, `glibc2.35`, `glibc2.39`. Any other non-empty, non-`none` value returns `error.UnsupportedLibcRuntime`.
For actions without a `libc` property, only the common `etc` from `runtime_root/common/root/etc` is mounted.
The runtime root itself can be a pre-mounted directory or an on-demand squashfs image (mounted read-only via a loop device in `runtime_mount.zig`).
Sources: [src/action_executor.zig:33-71](), [src/action_executor.zig:519-560](), [src/runtime_mount.zig:64-100]()
---
## Cgroup v2 Resource Limits
### Cgroup Lifecycle
`Cgroup.create` is called before `fork`. It:
1. Opens (or creates) `/sys/fs/cgroup/actiond/`.
2. Writes `+cpu +memory +pids` to `cgroup.subtree_control` (best-effort; failures are silently ignored).
3. Creates `/sys/fs/cgroup/actiond/action-{monotonic_id}/`.
4. Writes resource limit files as requested.
5. Returns the path to `cgroup.procs` for the child to self-join.
After the action completes, `Cgroup.deinit` writes `"1"` to `cgroup.kill` (killing all remaining processes in the cgroup) and then removes the cgroup directory.
If `/sys/fs/cgroup` is not accessible or any creation step fails, a zero-value `Cgroup{}` is returned and execution continues without limits (best-effort semantics).
### Platform Properties → Limits
| Platform property name(s) | Cgroup file written | Format |
|---|---|---|
| `limits.memory.bytes`, `memory`, `memory_bytes`, `resources:memory:bytes` | `memory.max` | bytes (`128M`, `1G`, raw int) |
| `limits.cpu.cores`, `cpu`, `cores`, `resources:cpu:cores` | `cpu.max` | `<quota> <period>` where period = 100 000 µs |
| `limits.pids.max`, `pids.max`, `pids` | `pids.max` | integer |
```zig
pub const CgroupLimits = struct {
memory_max_bytes: ?u64 = null,
cpu_max_cores: ?u32 = null,
pids_max: ?u32 = null,
...
};
```
The child process writes `"0\n"` to `cgroup.procs` at the very start of its setup sequence, before `PR_SET_NO_NEW_PRIVS`, ensuring that even setup work is accounted for under the limits.
Sources: [src/action_runner.zig:20-50](), [src/action_runner.zig:178-240]()
---
## Privilege Drop
`childDropPrivileges` executes after `chroot` and `chdir`, immediately before the setup-complete signal:
```zig
fn childDropPrivileges(uid: u32, gid: u32) void {
const linux = std.os.linux;
var empty_groups = [_]linux.gid_t{0};
childSyscallName(linux.setgroups(0, &empty_groups), "setgroups");
childSyscallName(linux.setresgid(@intCast(gid), @intCast(gid), @intCast(gid)), "setresgid");
childSyscallName(linux.setresuid(@intCast(uid), @intCast(uid), @intCast(uid)), "setresuid");
var header = linux.cap_user_header_t{ .version = linux_capability_version_3, .pid = 0 };
const data = [_]linux.cap_user_data_t{
.{ .effective = 0, .permitted = 0, .inheritable = 0 },
.{ .effective = 0, .permitted = 0, .inheritable = 0 },
};
_ = linux.capset(&header, &data[0]);
}
```
The default sandbox uid and gid are `65534` (the conventional `nobody` account). `setgroups(0, …)` clears all supplementary groups. `capset` with all-zero data drops every capability from the effective, permitted, and inheritable sets. Combined with `PR_SET_NO_NEW_PRIVS` (set earlier), this makes privilege re-escalation impossible regardless of what the action binary does.
Sources: [src/action_runner.zig:623-643](), [src/action_runner.zig:60-64]()
---
## actiondfs: FUSE-Free CAS Filesystem
When `use_actiondfs = true`, `action_executor` builds an `ActiondfsWorkspace` instead of materializing individual bind mounts per file. The `actiondfs` kernel module presents the CAS graph as a live filesystem. Two sub-modes exist:
| Mode | When used | Kernel mounts |
|---|---|---|
| `actiondfs_strict` | Default (inputs are not mutated) | `mount("actiondfs", workspace, "actiondfs", RDONLY\|NOSUID\|NODEV\|NOATIME, root=…,cas=…,stage=…)` |
| `actiondfs_overlay` | `mutates_inputs=true` platform property | actiondfs on `/lower`, then `overlay` on `/workspace` with actiondfs as lowerdir and a writable upperdir |
The `stage_dir` is a local directory for kernel-side caching of resolved CAS content. In overlay mode, the action's writes land in the overlay `upperdir` and are collected after the action exits by mounting the overlay again.
Sources: [src/action_executor.zig:456-545](), [src/action_executor.zig:170-210]()
---
## Setup Signal Protocol
The parent/child coordinate via a `setup_pipe` pair (fd 3 in the child). After all namespace, mount, chroot, and privilege setup succeeds, the child writes a single byte `"1"` and closes fd 3. The parent reads this signal before beginning to drain stdout/stderr. If setup fails at any point, the child calls `linux.exit(127)` without writing the signal; the parent detects EOF without the byte and knows setup failed.
Sources: [src/action_runner.zig:490-494](), [src/action_runner.zig:660-668]()
---
## Summary
Linux host execution in `actiond` composes five isolation layers:
1. **Chroot** — per-action `work_root/workspace/` directory isolates the filesystem view.
2. **Private mount namespace** (`CLONE_NEWNS`) — bind mounts, runtime mounts, and the chroot are invisible to the host and to other actions.
3. **Private network namespace** (`CLONE_NEWNET`) — only the loopback interface (`lo`) is brought up; all external networking is absent.
4. **Privilege drop** — uid/gid 65534 with empty groups, zero Linux capabilities, and `PR_SET_NO_NEW_PRIVS` set before unshare.
5. **Cgroup v2** — best-effort memory, CPU, and PID limits declared via REAPI platform properties, with automatic cleanup and forced kill on completion.
CAS inputs are delivered as read-only bind mounts (source = CAS blob path, target = workspace placeholder) rather than copies, making input staging O(inputs) in metadata operations rather than O(bytes). The `actiondfs` kernel module extends this further by deferring even the bind mounts until the kernel receives a file access.
Sources: [src/action_runner.zig:484-536](), [src/action_executor.zig:390-430]()
---
## 04. macOS VM Execution — darwin-actiond, vsock & Guest Worker
> How darwin-actiond serve-vm boots a minimal arm64 Linux VM via Virtualization.framework, proxies REAPI traffic over virtio-vsock to linux-actiond-guest, manages the guest-owned ext4 CAS on a virtio block device, and handles standalone binary payload extraction at startup.
- Page Markdown: https://grok-wiki.com/public/wiki/hermeticbuild-actiond-796c0ee40e63/pages/04-macos-vm-execution-darwin-actiond-vsock-guest-worker.md
- Generated: 2026-05-25T17:42:35.914Z
### Source Files
- `src/darwin_vm.zig`
- `src/darwin_vm_host.zig`
- `src/grpc_vsock_bridge.zig`
- `src/vsock.zig`
- `src/guest_init.zig`
- `src/guest_worker.zig`
- `src/control_protocol.zig`
- `src/embedded_payload.zig`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [cmd/darwin_actiond/vz_bridge.m](cmd/darwin_actiond/vz_bridge.m)
- [src/darwin_vm.zig](src/darwin_vm.zig)
- [src/darwin_vm_host.zig](src/darwin_vm_host.zig)
- [src/embedded_payload.zig](src/embedded_payload.zig)
- [src/grpc_vsock_bridge.zig](src/grpc_vsock_bridge.zig)
- [src/vsock.zig](src/vsock.zig)
- [src/control_protocol.zig](src/control_protocol.zig)
- [src/control_transport_fd.zig](src/control_transport_fd.zig)
- [src/guest_init.zig](src/guest_init.zig)
- [src/guest_worker.zig](src/guest_worker.zig)
</details>
# macOS VM Execution — darwin-actiond, vsock & Guest Worker
`darwin-actiond serve-vm` is the macOS execution path for actiond. It boots a minimal arm64 Linux virtual machine using Apple's [Virtualization.framework](https://developer.apple.com/documentation/virtualization), proxies REAPI (Remote Execution API) gRPC traffic from TCP listeners on the host into the VM over virtio-vsock, and lets a guest `linux-actiond` process own the CAS and execute build actions natively on Linux. This design keeps the Linux execution environment hermetic and avoids any dependency on macOS for action sandboxing or filesystem isolation, while still providing the REAPI surface at a host TCP address that Bazel or other clients connect to directly.
The overall execution split is: the macOS host manages VM lifecycle, payload extraction, and bidirectional byte-pumping; the Linux guest owns the REAPI ActionCache, CAS (on a dedicated ext4 block device), and all action execution via `linux-actiond --guest-worker`.
---
## Architecture Overview
```text
┌────────────────────────── macOS host ────────────────────────────────────────┐
│ darwin-actiond serve-vm │
│ │
│ ┌──────────────┐ ┌──────────────────────────────────────────────────┐ │
│ │ TCP listener│ │ Virtualization.framework VM (arm64 Linux) │ │
│ │ :8980 (gRPC)│<->│ ┌───────────────────────────────────────────┐ │ │
│ └──────────────┘ │ │ virtio-vsock port 5001 (gRPC HTTP/2) │ │ │
│ grpc_vsock_bridge │ │ virtio-vsock port 5000 (control channel)│ │ │
│ ┌──────────────┐ │ ├───────────────────────────────────────────┤ │ │
│ │ control_ │ │ │ linux-actiond --guest-worker │ │ │
│ │ transport_fd │<->│ │ • REAPI Execute / CAS / ActionCache │ │ │
│ └──────────────┘ │ │ • action executor + actiondfs │ │ │
│ │ ├───────────────────────────────────────────┤ │ │
│ Block devices: │ │ /cas (ext4 virtio block, R/W) │ │ │
│ cas.ext4 ───────│->│ /runtimes (squashfs virtio block, R/O) │ │ │
│ runtimes.sqfs ───-│->│ /work, /tmp (tmpfs) │ │ │
│ │ └───────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
```
---
## `serve` Startup Sequence
`darwin_vm_host.serve` orchestrates everything before the VM is started. The function is the single entry point called by the `darwin-actiond serve-vm` subcommand.
### 1. Working Directory
A root working directory (default `/tmp/actiond-vm`, overridable with `--root`) is created. All extracted artifacts, decompressed boot files, and the default CAS image file live under this root.
Sources: [src/darwin_vm_host.zig:121-130]()
### 2. CAS ext4 Image
The CAS image file is resolved (defaulting to `<root>/cas.ext4`). If the file does not already exist, it is created as a sparse file of the configured size (default 32 GiB). The guest init process mounts this file as an ext4 block device at `/cas`; the format must already be present (the host never calls `mkfs`—the image must be pre-formatted or already used from a prior run).
```zig
// src/darwin_vm_host.zig:136-140
const owned_cas_image_path = if (options.cas_image == null)
try std.fs.path.join(allocator, &.{ options.root, "cas.ext4" })
else "";
...
try ensureCasImageFile(io, cas_image_path, options.cas_image_size_mib);
```
Sources: [src/darwin_vm_host.zig:133-148](), [src/darwin_vm_host.zig:308-332]()
### 3. Embedded Payload Extraction
When `--kernel`, `--initramfs`, or `--runtime-image` are not explicitly provided, `serve` calls `embedded_payload.extractFromSelf` to pull these artifacts out of the running `darwin-actiond` binary itself. On macOS the binary is a Mach-O and the payloads are stored in a dedicated `__ACTIOND` segment with sections `__kernel`, `__initramfs`, and `__runtimes`. On Linux the runtimes payload is stored in a `.actiond.runtimes` ELF section.
| Artifact | Mach-O section | Payload constant |
|---|---|---|
| Linux kernel | `__ACTIOND,__kernel` | `embedded_payload.kernel_name` = `"linux_kernel"` |
| Initramfs | `__ACTIOND,__initramfs` | `embedded_payload.initramfs_name` = `"initramfs.cpio.zst"` |
| Runtime squashfs | `__ACTIOND,__runtimes` | `embedded_payload.runtimes_name` = `"runtimes.sqfs"` |
Each payload is extracted once: a SHA-256 hash of the binary range is computed and the extracted file is cached at `<root>/embedded/<name>-<hash>` so subsequent starts skip the copy. The extract is followed by an `fsync` to guarantee the file is durable before use.
Sources: [src/embedded_payload.zig:1-45](), [src/embedded_payload.zig:75-130]()
### 4. Boot Artifact Decompression
Both the kernel and initramfs may be stored in zstd-compressed form (magic bytes `0x28 0xb5 0x2f 0xfd`). `prepareBootKernel` and `prepareBootInitramfs` detect the magic, decompress via `ZSTD_decompress`, and write a raw file to `<root>/boot/<artifact>-<sha256>.<ext>`. The raw file is reused on subsequent starts if already present. The Virtualization.framework `VZLinuxBootLoader` requires the uncompressed ARM64 `Image` format and an uncompressed CPIO initramfs.
```
Stored as: linux_kernel (zstd-compressed)
Served as: boot/kernel-<sha256>.Image (raw, written once)
```
Sources: [src/darwin_vm_host.zig:258-305](), [src/darwin_vm_host.zig:355-420]()
---
## Virtualization.framework Bridge (`vz_bridge.m`)
The actual VM creation is done in Objective-C via `cmd/darwin_actiond/vz_bridge.m`. This file is the boundary between Zig and Virtualization.framework and exposes three C-callable functions that `darwin_vm.zig` declares as `extern`.
### VM Configuration
`actiond_vm_start` assembles a `VZVirtualMachineConfiguration` with:
| Component | Value / API |
|---|---|
| Boot loader | `VZLinuxBootLoader` with kernel, initramfs, cmdline `console=hvc0 init=/init panic=-1 quiet` |
| CPU count | Configurable (`--cpus`, default 2) |
| Memory | Configurable (`--memory-mib`, default 512 MiB) |
| CAS block device | `VZVirtioBlockDeviceConfiguration` (read-write, from `cas.ext4`) |
| Runtime block device | `VZVirtioBlockDeviceConfiguration` (read-only, from `runtimes.sqfs`), omitted if not provided |
| vsock | `VZVirtioSocketDeviceConfiguration` (single device) |
| Serial console | `VZVirtioConsoleDeviceSerialPortConfiguration` forwarded to host stderr |
| Network | None (`networkDevices = @[]`) |
| Entropy | `VZVirtioEntropyDeviceConfiguration` |
| Platform | `VZGenericPlatformConfiguration` |
The VM runs on a private `dispatch_queue_t` named `dev.actiond.vm`. Start is synchronous with a configurable timeout (default 30 s).
Sources: [cmd/darwin_actiond/vz_bridge.m:59-156]()
### vsock Connection
`actiond_vm_connect(handle, port, timeout_ms)` dispatches `connectToPort:completionHandler:` on `VZVirtioSocketDevice`, waits on a dispatch semaphore, and returns a `dup()`-ed file descriptor to the caller. This means each connection request from the Zig side becomes an OS-level file descriptor that can be read/written with ordinary POSIX calls.
```objc
// cmd/darwin_actiond/vz_bridge.m:177-195
VZVirtioSocketDevice *socketDevice = (VZVirtioSocketDevice *)socketDevices[0];
[socketDevice connectToPort:port completionHandler:^(VZVirtioSocketConnection *connection, NSError *errorOrNil) {
...
result_fd = dup(fd);
[connection close];
dispatch_semaphore_signal(semaphore);
}];
```
Sources: [cmd/darwin_actiond/vz_bridge.m:157-220]()
---
## Zig VM Wrapper (`darwin_vm.zig`)
`darwin_vm.Machine` wraps the opaque handle returned by `actiond_vm_start`. It is macOS-only at compile time (`builtin.os.tag != .macos` returns `error.UnsupportedHost` from `start`). It exposes:
- `Machine.start(options)` — calls `actiond_vm_start` with absolute paths; returns a `Machine` value.
- `Machine.connectControlPort(port)` — retry loop calling `actiond_vm_connect` with per-attempt and total timeouts. Sleeps 100 ms between attempts if the guest is not ready yet.
- `Machine.opener()` — returns a `control_transport_fd.Opener` that connects to `vsock.control_port` (5000) for the internal control channel.
- `Machine.deinit()` — calls `actiond_vm_stop` then `actiond_vm_release`.
Sources: [src/darwin_vm.zig:21-113]()
---
## vsock Port Layout
The guest `linux-actiond` listens on two virtio-vsock ports, declared as constants in `vsock.zig`:
| Port | Constant | Purpose |
|---|---|---|
| **5000** | `vsock.control_port` | Internal control channel (actiondfs stats, future host→guest RPCs) |
| **5001** | `vsock.grpc_port` | Raw gRPC HTTP/2 — forwarded from the host TCP listener |
The guest-side `vsock.Listener` and `vsock.Connection` are Linux-only (`AF.VSOCK`, `SOCK.STREAM`). The host side uses `darwin_vm.Machine.connectControlPort(port)` which calls into Virtualization.framework's vsock connect API.
Sources: [src/vsock.zig:1-35]()
---
## gRPC–vsock Bridge (`grpc_vsock_bridge.zig`)
`grpc_vsock_bridge.serve(io, listen, machine)` is the host-side TCP listener for all REAPI traffic. It:
1. Listens on a TCP address (default `127.0.0.1:8980`).
2. For each accepted TCP connection, spawns a detached thread via `connectionThread`.
3. `connectionThread` opens a vsock connection to the guest at `vsock.grpc_port` (5001) via `machine.connectControlPort(vsock.grpc_port)`.
4. Two sub-threads pump bytes bidirectionally: `client→guest` and `guest→client`, each in a `pumpAndShutdown` loop using a 64 KiB buffer.
5. Each direction sends a TCP `shutdown(SHUT_WR)` when its read side sees EOF, allowing clean half-close.
6. After both pump threads join, connection timing is logged (`vm bridge timing` log lines) if the connection transferred ≥ 64 KiB or lasted ≥ 10 ms.
This bridge is transparent: the guest sees ordinary gRPC HTTP/2 frames; the Bazel client sees an ordinary TCP endpoint. No gRPC or HTTP/2 parsing is done in the bridge.
```zig
// src/grpc_vsock_bridge.zig:35-50
const guest_fd = machine.connectControlPort(vsock.grpc_port) catch |err| { ... };
const client_to_guest = std.Thread.spawn(.{}, pumpAndShutdown, .{ client_fd, guest_fd, &client_to_guest_stats });
const guest_to_client = std.Thread.spawn(.{}, pumpAndShutdown, .{ guest_fd, client_fd, &guest_to_client_stats });
```
Sources: [src/grpc_vsock_bridge.zig:19-95]()
---
## Control Channel & Protocol (`control_protocol.zig`, `control_transport_fd.zig`)
### Wire Format
The control protocol uses a simple binary framing over the vsock connection on port 5000. Every message has an 18-byte header:
| Bytes | Field | Notes |
|---|---|---|
| 0–3 | Magic | ASCII `ACTD` |
| 4 | Version | `1` |
| 5 | Tag | Encodes `CallKind` (request) or `Status` (response) |
| 6–9 | `method_len` | `u32` little-endian |
| 10–17 | `body_len` | `u64` little-endian |
The payload immediately follows the header: `method_len` bytes of method string, then `body_len` bytes of body. Currently only `CallKind.unary` (tag `0`) is defined.
Sources: [src/control_protocol.zig:1-55]()
### Host-Side Client (`control_transport_fd.Client`)
`control_transport_fd.Client` maintains a pool of up to 32 cached vsock file descriptors (one per slot). Each `call()` acquires a slot lock, lazily opens a connection via the `Opener` if needed, writes a request frame, and reads the response. On error the slot's cached fd is closed and discarded (no silent reconnect on the next call from the same slot, but the slot is freed for re-use).
Sources: [src/control_transport_fd.zig:30-100]()
### Currently Supported Method
The only method dispatched over the control channel is:
```
/actiond.internal.Guest/ActiondfsStats
```
This method has an empty body request. The guest responds with the contents of `/proc/actiondfs_stats` (the actiondfs kernel module's procfs counter file) plus CAS put-file stats appended from `cas.appendPutFileStats`.
Sources: [src/control_protocol.zig:13](), [src/guest_worker.zig:130-170]()
---
## Guest Init (`guest_init.zig`)
`guest_init.run()` is the Linux PID 1 (`/init`) process executed inside the VM. It is Linux-only at compile time. Its responsibilities:
1. Mount core virtual filesystems in order:
| Source | Target | FS type |
|---|---|---|
| `proc` | `/proc` | `proc` |
| `sysfs` | `/sys` | `sysfs` |
| `cgroup2` | `/sys/fs/cgroup` | `cgroup2` |
| `devtmpfs` | `/dev` | `devtmpfs` |
| `tmpfs` | `/tmp` | `tmpfs` |
| `tmpfs` | `/work` | `tmpfs` |
2. Mount the CAS ext4 block device at `/cas`. The init scans a fixed list of candidate devices (`/dev/vda`, `/dev/vdb`, `/dev/vdc`, `/dev/vdd`, `/dev/sda`, `/dev/sdb`, `/dev/nvme0n1`) then falls back to reading `/sys/block`. It retries up to 50 times with 100 ms waits. Mount flags: `MS_NOSUID | MS_NODEV | MS_NOATIME`, data option `errors=remount-ro`.
3. Mount the runtime squashfs image at `/runtimes` (read-only). Same candidate scan and retry logic.
4. `exec` into `linux-actiond --guest-worker` via `std.process.replace`. The init process is replaced in-place, so the guest worker runs as PID 1 after mount completion.
```zig
// src/guest_init.zig:54
pub const worker_argv = [_][]const u8{ "/actiond", "--guest-worker" };
```
Sources: [src/guest_init.zig:40-75](), [src/guest_init.zig:75-175]()
---
## Guest Worker (`guest_worker.zig`)
`guest_worker.run()` is the Linux-only REAPI server inside the VM. It:
1. Opens `/cas` and `/work/actions` directories.
2. Initialises the CAS layout (`cas.Store.ensureLayout`) and ActionCache under `/cas/ac`.
3. Cleans and re-creates the actiondfs staging directory at `/cas/actiondfs-stage`.
4. Prepares action execution options using the guest-native CAS paths:
- `cas_blob_root_path = "/cas/blobs/sha256"`
- `actiondfs_stage_root_path = "/cas/actiondfs-stage"`
- `runtime_root_path = "/runtimes"`
5. Listens on `vsock.control_port` (5000) for control-channel connections and accepts them in a loop, dispatching each in a detached thread.
6. Spawns a detached gRPC thread that listens on `vsock.grpc_port` (5001) and serves full REAPI HTTP/2 traffic (Execute, CAS, ActionCache, ByteStream, Capabilities) through `grpc_http2_server.handleConnectionFd`.
```zig
// src/guest_worker.zig:44-50
const listener = try vsock.listen(vsock.control_port);
...
const grpc_thread = try std.Thread.spawn(.{}, grpcListenerThread, .{ io, allocator, server });
```
Sources: [src/guest_worker.zig:21-65](), [src/guest_worker.zig:67-100]()
---
## actiondfs Stats Polling
When `--actiondfs-stats-path` is provided to `serve-vm`, `darwin_vm_host.serve` spawns a background thread that calls `writeActiondfsStatsSnapshot` once per second. Each snapshot:
1. Sends a `control_protocol.Request` with method `/actiond.internal.Guest/ActiondfsStats` over the control channel (vsock port 5000).
2. Receives the response body (contents of guest `/proc/actiondfs_stats`).
3. Writes the body to the specified path on the host filesystem.
This provides a host-visible window into the guest actiondfs kernel module's lifetime counters (directory cache hits/misses, CAS blob opens, splice reads, mmap calls, stale retry events, etc.) without requiring any host filesystem access from within the guest.
Sources: [src/darwin_vm_host.zig:192-230](), [src/guest_worker.zig:128-165]()
---
## Standalone Binary Mode
When running without explicit `--kernel`, `--initramfs`, and `--runtime-image` flags, `darwin-actiond serve-vm` can be a self-contained binary. The build toolchain embeds all three artifacts as named sections inside the Mach-O binary (`__ACTIOND` segment). At startup `embedded_payload.extractFromSelf` parses the running executable's Mach-O load commands, locates each section by name, SHA-256 hashes the byte range, and copies it out to `<root>/embedded/<name>-<hash>` only if not already present. A matching size check on re-entry skips the copy on warm starts.
The same mechanism works for Linux ELF binaries for the `runtimes.sqfs` payload (in `.actiond.runtimes`). The kernel and initramfs sections are not embedded in ELF binaries because guest-side binaries do not boot VMs.
Sources: [src/embedded_payload.zig:47-130](), [src/darwin_vm_host.zig:143-170]()
---
## Sequence: Client REAPI Request End-to-End
```mermaid
sequenceDiagram
participant Bazel as Bazel client (TCP)
participant Bridge as grpc_vsock_bridge (macOS)
participant VZ as Virtualization.framework vsock
participant Guest as linux-actiond --guest-worker
participant CAS as /cas (ext4 block)
Bazel->>Bridge: TCP connect :8980
Bridge->>VZ: connectToPort(5001)
VZ-->>Bridge: fd (dup of vsock connection)
Bazel->>Bridge: gRPC HTTP/2 frame(s)
Bridge->>VZ: write bytes (64 KiB buffer)
VZ->>Guest: vsock read
Guest->>CAS: CAS lookup / write
CAS-->>Guest: blob data
Guest-->>VZ: gRPC HTTP/2 response
VZ-->>Bridge: read bytes
Bridge-->>Bazel: TCP write
Bazel->>Bridge: TCP EOF (SHUT_WR)
Bridge->>VZ: shutdown(SHUT_WR)
VZ->>Guest: vsock EOF
Guest-->>VZ: shutdown(SHUT_WR)
VZ-->>Bridge: read EOF
Bridge-->>Bazel: shutdown(SHUT_WR)
```
---
## Configuration Reference
All flags are parsed by `darwin_vm_host.parseServeVmArgs`. Every flag supports both `--flag value` and `--flag=value` forms.
| Flag | Default | Purpose |
|---|---|---|
| `--listen` | `127.0.0.1:8980` | TCP address for the host-side gRPC bridge |
| `--root` | `/tmp/actiond-vm` | Working directory for artifacts and extracted payloads |
| `--cas-image` | `<root>/cas.ext4` | Path to the ext4 CAS disk image |
| `--cas-image-size-mib` | `32768` (32 GiB) | Size when creating a new CAS image |
| `--kernel` | *(embedded)* | Path to raw or zstd-compressed arm64 Linux Image |
| `--initramfs` | *(embedded)* | Path to raw or zstd-compressed CPIO initramfs |
| `--runtime-image` | *(embedded)* | Path to squashfs runtimes image |
| `--memory-mib` | `512` | Guest RAM in MiB |
| `--cpus` | `2` | Guest vCPU count |
| `--start-timeout-ms` | `30000` | VM start timeout (ms) |
| `--connect-timeout-ms` | `60000` | Total vsock connect timeout (ms) |
| `--actiondfs-stats-path` | *(disabled)* | Host path for periodic actiondfs stats snapshots |
Sources: [src/darwin_vm_host.zig:27-50](), [src/darwin_vm_host.zig:52-120]()
---
## Summary
`darwin-actiond serve-vm` is a thin macOS host process whose main job is VM lifecycle management and byte forwarding. It uses Virtualization.framework (via an Objective-C bridge in `vz_bridge.m`) to boot a minimal arm64 Linux guest with a writable ext4 CAS disk and an optional read-only squashfs runtime image, both attached as virtio block devices. The host exposes two vsock ports: port 5001 carries raw gRPC HTTP/2 frames pumped transparently between the Bazel TCP client and the guest `linux-actiond` worker, while port 5000 carries an internal `ACTD`-framed control protocol used for host-initiated queries such as actiondfs stats collection. Inside the guest, a minimal PID-1 init mounts all required filesystems and `exec`s into `linux-actiond --guest-worker`, which then owns all REAPI state for the lifetime of the VM. In standalone binary mode the kernel, initramfs, and runtimes squashfs are extracted from Mach-O sections at first run, enabling a single self-contained executable to boot the full execution environment.
---
## 05. actiondfs — Lazy CAS-Backed Input Filesystem
> The custom Linux kernel filesystem module that exposes REAPI input trees to VM actions without per-file copies: lazy Directory proto resolution from the guest CAS, VM-lifetime parsed Directory cache keyed by digest, backing-file delegation for read/splice/mmap, strict vs. overlayfs compatibility paths for input-mutating actions, and the /proc/actiondfs_stats counter interface.
- Page Markdown: https://grok-wiki.com/public/wiki/hermeticbuild-actiond-796c0ee40e63/pages/05-actiondfs-lazy-cas-backed-input-filesystem.md
- Generated: 2026-05-25T17:47:35.908Z
### Source Files
- `kernel/actiondfs/actiondfs.c`
- `kernel/actiondfs/BUILD.bazel`
- `kernel/actiondfs/Kconfig`
- `kernel/actiondfs/Makefile`
- `src/staged_cas_index.zig`
- `src/cas.zig`
- `ARCHITECTURE.md`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [kernel/actiondfs/actiondfs.c](kernel/actiondfs/actiondfs.c)
- [kernel/actiondfs/BUILD.bazel](kernel/actiondfs/BUILD.bazel)
- [kernel/actiondfs/Kconfig](kernel/actiondfs/Kconfig)
- [kernel/actiondfs/Makefile](kernel/actiondfs/Makefile)
- [ARCHITECTURE.md](ARCHITECTURE.md)
- [src/action_executor.zig](src/action_executor.zig)
</details>
# actiondfs — Lazy CAS-Backed Input Filesystem
`actiondfs` is a custom Linux kernel filesystem module built into the actiond VM kernel. It exposes the REAPI input tree for each action as a native Linux filesystem mount — without copying any file content from the Content-Addressable Storage (CAS). Directory metadata is resolved lazily on first lookup from REAPI `Directory` protobuf blobs stored in the guest CAS, and file content reads are delegated directly to the backing CAS blob via the kernel's `backing_file_open` family of helpers.
The module exists because the VM guest needs a copy-free, isolation-correct way to present thousands of input files to compiler actions. Instead of materializing each input as a bind mount (the Linux-host strategy) or an overlayfs copy-up, `actiondfs` presents the entire input tree as a virtual mount: inode metadata comes from the parsed protobuf graph, and `read`, `splice_read`, and `mmap` calls pass through to the real CAS file without allocating intermediate page cache pages in the actiondfs inode.
---
## Module Overview
The module lives entirely in [`kernel/actiondfs/actiondfs.c`](kernel/actiondfs/actiondfs.c) and is compiled as a kernel built-in under `CONFIG_ACTIONDFS_FS`.
```
kernel/actiondfs/
actiondfs.c # complete implementation (~3870 lines)
Kconfig # CONFIG_ACTIONDFS_FS bool
Makefile # obj-$(CONFIG_ACTIONDFS_FS) += actiondfs.o
BUILD.bazel # exports srcs filegroup for linux.bzl kernel build
```
Key constants defined at the top of the source:
| Constant | Value | Purpose |
|---|---|---|
| `ACTIONDFS_MAGIC` | `0x41444653` | Filesystem magic number |
| `ACTIONDFS_MAX_DIRECTORY_PROTO_SIZE` | 64 MiB | Protobuf read guard |
| `ACTIONDFS_DIR_CACHE_BITS` | 12 (4096 buckets) | VM-lifetime dir-cache hash table |
| `ACTIONDFS_BLOB_PATH_CACHE_BITS` | 14 (16384 buckets) | CAS path resolution cache |
| `ACTIONDFS_BLOB_PATH_CACHE_MAX` | 16384 | Eviction threshold |
| `ACTIONDFS_STALE_RETRY_ATTEMPTS` | 128 | Max retries on `-ESTALE` |
| `ACTIONDFS_STALE_RETRY_MS` | 2 | Sleep between stale retries |
| `ACTIONDFS_PROC_STATS` | `"actiondfs_stats"` | `/proc` entry name |
| `ACTIONDFS_EMPTY_SHA256` | well-known hash | Empty-file short-circuit |
Sources: [kernel/actiondfs/actiondfs.c:43-57]()
---
## Mount Interface
Each per-action mount passes options as a comma-separated string:
```
root=<sha256>,root_size=<bytes>,cas=/cas/blobs/sha256[,stage=/path/to/stage]
```
| Option | Required | Description |
|---|---|---|
| `root=` | yes | SHA-256 hex of the REAPI input root `Directory` digest |
| `root_size=` | no | Expected size of the root proto blob in bytes |
| `cas=` | yes | Absolute path to the CAS blob directory (sharded `xx/sha256hash`) |
| `stage=` | no | Absolute path to the per-action stage directory; enables write support |
When `stage=` is absent, the superblock is mounted `SB_RDONLY`. When `stage=` is present, `sbi->staged_writes = true` and write operations on new files/directories are forwarded to the stage directory.
The root node starts with `loaded = false`; no protobuf read happens at mount time. The root `Directory` is parsed on first access via `actiondfs_ensure_loaded`.
Sources: [kernel/actiondfs/actiondfs.c:2614-2661](), [kernel/actiondfs/actiondfs.c:3695-3726]()
---
## Data Structures
### Per-mount node tree (`actiondfs_node`)
Every VFS inode has a corresponding `actiondfs_node` stored in `inode->i_private`:
```c
struct actiondfs_node {
char *name;
enum actiondfs_node_origin origin; // INPUT or STAGED
char *stage_rel; // relative path into stage dir (STAGED only)
u64 ino;
umode_t mode;
u64 size;
char hash[65]; // SHA-256 hex of CAS blob (INPUT only)
struct file *blob_file; // cached backing file (INPUT only)
struct mutex blob_lock;
bool loaded; // false until CAS proto is parsed
struct actiondfs_node *parent;
struct actiondfs_cached_dir *cached_dir; // pointer into VM-lifetime cache
struct actiondfs_materialized_child *materialized_children;
struct actiondfs_node **file_children;
struct actiondfs_node **dir_children;
...
};
```
`origin` is either `ACTIONDFS_NODE_INPUT` (read-only CAS blob) or `ACTIONDFS_NODE_STAGED` (read-write file in the stage directory).
### VM-lifetime directory cache (`actiondfs_cached_dir`)
```c
struct actiondfs_cached_dir {
struct hlist_node hnode;
char hash[65];
u64 size;
struct actiondfs_cached_child *file_children; // sorted by name
struct actiondfs_cached_child *dir_children; // sorted by name
size_t file_count, dir_count;
...
};
```
Each `actiondfs_cached_child` stores only the name, mode, size, and hash of a child — no per-mount node pointers. Child nodes are "materialized" (allocated as `actiondfs_node`) on demand, keyed by `(is_dir, index)` in the per-mount `materialized_children` list.
Sources: [kernel/actiondfs/actiondfs.c:64-100](), [kernel/actiondfs/actiondfs.c:103-115]()
---
## Lazy Directory Resolution
When a lookup or `readdir` arrives on an unloaded directory node, `actiondfs_ensure_loaded` fires:
```c
static int actiondfs_ensure_loaded(struct super_block *sb,
struct actiondfs_node *dir)
{
if (dir->loaded) return 0;
mutex_lock(&sbi->load_lock);
if (!dir->loaded) {
actiondfs_stat_inc(ACTIONDFS_STAT_DIR_LOADS);
err = actiondfs_load_reapi_directory_locked(sbi, dir);
}
mutex_unlock(&sbi->load_lock);
return err;
}
```
Inside `actiondfs_load_reapi_directory_locked`, the path diverges based on whether the node is the root:
- **Root directory**: parsed directly into per-mount `file_children` / `dir_children` arrays; not cached. The `ACTIONDFS_STAT_ROOT_DIR_PARSES` counter is incremented. Child directory nodes are added with `loaded = false`.
- **Non-root directory**: `actiondfs_get_cached_dir` is called, which checks the VM-lifetime `actiondfs_dir_cache` hash table first. On a miss, the CAS blob is read and parsed into a new `actiondfs_cached_dir`, then inserted. On a race (concurrent miss), the duplicate is discarded and the winner's entry is returned.
The inline protobuf parser handles REAPI `Directory` fields 1 (FileNode), 2 (DirectoryNode), and skips unknown fields. Symlinks (field 3) return `-EOPNOTSUPP`.
Sources: [kernel/actiondfs/actiondfs.c:2558-2610](), [kernel/actiondfs/actiondfs.c:2470-2557]()
---
## VM-Lifetime Caches
### Directory metadata cache
```c
static DEFINE_HASHTABLE(actiondfs_dir_cache, ACTIONDFS_DIR_CACHE_BITS);
static DEFINE_MUTEX(actiondfs_dir_cache_lock);
```
- Keyed by the first 16 hex characters of the SHA-256 digest, interpreted as `unsigned long`.
- Holds `actiondfs_cached_dir` entries for the lifetime of the VM module; never evicted.
- Since Directory protos are content-addressed and immutable, the same source directory or tree artifact used across many actions reuses one parse.
- Root directories are intentionally excluded because they are unique per action.
### Blob path cache
```c
static DEFINE_HASHTABLE(actiondfs_blob_path_cache, ACTIONDFS_BLOB_PATH_CACHE_BITS);
static LIST_HEAD(actiondfs_blob_path_cache_list);
static DEFINE_MUTEX(actiondfs_blob_path_cache_lock);
static size_t actiondfs_blob_path_cache_count;
```
- Caches resolved `struct path` for CAS blobs by digest (max 16384 entries).
- **Hot path uses RCU** (`hash_for_each_possible_rcu`); misses and evictions use the mutex.
- Eviction selects the entry with the lowest hit count from the LRU list.
- When a stale handle is detected (`-ESTALE`), the entry is dropped via `actiondfs_drop_cached_blob_path` and the open is retried.
Sources: [kernel/actiondfs/actiondfs.c:256-274](), [kernel/actiondfs/actiondfs.c:2279-2430]()
---
## Backing File Delegation
For `ACTIONDFS_NODE_INPUT` files, all data-path operations are forwarded to the real CAS blob file using the kernel `backing_file_*` API:
### `read_iter`
```c
file = actiondfs_get_node_blob_file(sbi, node, iocb->ki_filp);
nread = backing_file_read_iter(file, to, iocb, iocb->ki_flags, &ctx);
```
`actiondfs_get_node_blob_file` caches the `backing_file_open` result in `node->blob_file` under `node->blob_lock`, so repeated reads on the same node reuse the same backing file descriptor.
### `splice_read`
```c
nread = backing_file_splice_read(file, &backing_iocb, pipe, wanted, flags, &ctx);
```
Splice uses a copy of the iocb keyed to the actiondfs file but with position tracking on the backing iocb. This allows `sendfile`/pipe-based reads without an intermediate copy.
### `mmap`
```c
err = backing_file_mmap(file, vma, &ctx);
```
The VMA is attached to the backing CAS blob. Page faults go directly to the CAS filesystem's page cache, not through actiondfs folios. This is the critical path for compiler `execve` and library mapping — the actiondfs path shown in `/proc/PID/maps` remains the actiondfs path while data comes from the native ext4 page cache.
All three operations retry on `-ESTALE` up to `ACTIONDFS_STALE_RETRY_ATTEMPTS` (128) times, clearing the blob path cache and node blob cache on each stale detection.
Sources: [kernel/actiondfs/actiondfs.c:1712-1820](), [kernel/actiondfs/actiondfs.c:1820-1920]()
---
## Strict vs. Overlay Compatibility Paths
The executor selects between two modes based on the `mutates_inputs` execution requirement:
```
┌─────────────────────────────────────────────────────┐
│ action_executor.zig │
│ │
│ mutates_inputs absent / "false" / "0" │
│ → ActionInputMode.actiondfs_strict │
│ → mount actiondfs at /workspace │
│ (stage= path for output capture) │
│ │
│ mutates_inputs = "1" / "yes" / truthy │
│ → ActionInputMode.actiondfs_overlay │
│ → mount actiondfs (no stage=) at /lower │
│ → mount overlayfs over /lower as /workspace │
│ (upperdir = per-action stage dir) │
└─────────────────────────────────────────────────────┘
```
**Strict mode** mount data (from `src/action_executor.zig`):
```
root={hash},root_size={bytes},cas={cas_blob_root},stage={stage_path}
```
**Overlay mode** mount data:
```
root={hash},root_size={bytes},cas={cas_blob_root} ← actiondfs (no stage=, SB_RDONLY)
lowerdir={lower_path},upperdir={stage_path},workdir={work_path} ← overlayfs on top
```
In strict mode, write operations on new files route through `actiondfs` to the stage directory, while attempts to write to CAS-backed input nodes return `-EROFS`. In overlay mode, overlayfs handles copy-up for input mutation so actions that overwrite inputs get a writable upper layer without patching `actiondfs`.
Sources: [src/action_executor.zig:876-944](), [src/action_executor.zig:350-357]()
---
## Stage Layer (Write Operations)
When a `stage=` path is present, `actiondfs` supports create, mkdir, unlink, rmdir, rename, write, and truncate for nodes with `origin == ACTIONDFS_NODE_STAGED`. Write operations to `ACTIONDFS_NODE_INPUT` nodes always return `-EROFS`.
The stage layer uses VFS pass-through:
- **Create/mkdir**: Checks that the name does not collide with an input node. Ensures the parent path exists in the stage directory via `actiondfs_stage_ensure_dir`. Then calls `vfs_create` / `vfs_mkdir` on the real stage dentry.
- **Write**: Opens the stage file via `backing_file_open` with `O_WRONLY`, delegates to `backing_file_write_iter`, updates `node->size`.
- **Rename**: Calls `vfs_rename` on both real stage dentries; updates `node->stage_rel` and `node->parent` on success.
- **copy_file_range**: Directly calls `vfs_copy_file_range` between the real backing files (CAS blob → stage file or stage → stage), bypassing actiondfs folios entirely.
During `readdir` on a staged directory, the stage directory is iterated via `iterate_dir`; entries that match input children are suppressed so the merged view presents each name once.
Sources: [kernel/actiondfs/actiondfs.c:3101-3200](), [kernel/actiondfs/actiondfs.c:2986-3050]()
---
## VFS Operations Table
| Operation | Input node | Staged node |
|---|---|---|
| `read_iter` | backing_file_read_iter → CAS blob | backing_file_read_iter → stage file |
| `write_iter` | `-EROFS` | backing_file_write_iter → stage file |
| `mmap` | backing_file_mmap → CAS blob VMA | backing_file_mmap → stage file VMA |
| `splice_read` | backing_file_splice_read → CAS | backing_file_splice_read → stage |
| `copy_file_range` | vfs_copy_file_range (real files) | vfs_copy_file_range (real files) |
| `lookup` | parse cached dir, materialize node | stat stage path, build staged node |
| `create` | `-EROFS` | vfs_create in stage dir |
| `mkdir` | `-EROFS` | vfs_mkdir in stage dir |
| `unlink` | `-EROFS` | vfs_unlink in stage dir |
| `rename` | `-EROFS` | vfs_rename, update stage_rel |
| `setattr (size)` | `-EROFS` | vfs_truncate on stage file |
Sources: [kernel/actiondfs/actiondfs.c:3614-3650]()
---
## `/proc/actiondfs_stats` Counter Interface
At module init, a procfs entry is registered:
```c
proc_create_single(ACTIONDFS_PROC_STATS, 0444, NULL, actiondfs_stats_show);
```
The handler iterates `actiondfs_stats[]` (an array of `atomic64_t`) and prints one `name value` pair per line. Counters accumulate for the VM's lifetime and are never reset. The AGENTS.md notes that this file is readable inside a running guest at `/proc/actiondfs_stats`.
Selected counter groups:
| Group | Counters |
|---|---|
| Mount activity | `mounts` |
| Directory loading | `dir_loads`, `root_dir_parses`, `cached_dir_requests`, `dir_cache_hits`, `dir_cache_misses`, `dir_cache_races`, `cached_dir_builds`, `cached_dir_bytes` |
| Lookup/readdir | `lookups`, `lookup_hits`, `lookup_negative`, `cached_lookups`, `cached_lookup_hits`, `cached_materialized`, `cached_reused`, `readdirs`, `readdir_entries` |
| Blob path cache | `blob_path_cache_hits`, `blob_path_cache_misses`, `blob_path_cache_inserts`, `blob_path_cache_evictions`, `blob_path_cache_races` |
| Data I/O | `backing_reads`, `backing_read_bytes`, `splice_reads`, `splice_read_bytes`, `mmaps`, `mmap_bytes`, `mmap_failures` |
| Stale retries | `blob_open_stale_retries`, `backing_read_stale_retries`, `splice_read_stale_retries` |
| Stage layer | `stage_read_calls`, `stage_write_calls`, `stage_create_calls`, `stage_mkdir_calls`, `stage_rename_calls`, `stage_copy_file_range_*`, … |
Sources: [kernel/actiondfs/actiondfs.c:137-280](), [kernel/actiondfs/actiondfs.c:3848-3862]()
---
## Lifecycle Sequence
```mermaid
sequenceDiagram
participant Ex as action_executor.zig
participant K as Linux kernel (actiondfs)
participant CAS as /cas/blobs/sha256
Ex->>K: mount("actiondfs", "/workspace", "root=...,cas=...,stage=...")
Note over K: fill_super: allocate sbi, root node (loaded=false)
Ex->>K: open("/workspace/src/foo.cc")
K->>K: actiondfs_lookup → actiondfs_ensure_loaded(root)
K->>CAS: read root Directory proto blob
K->>K: parse FileNodes/DirectoryNodes → file_children[]/dir_children[]
K->>K: actiondfs_lookup → find "src" dir node (loaded=false)
K->>K: actiondfs_ensure_loaded("src") → actiondfs_get_cached_dir(hash)
K->>CAS: read src/ Directory proto (on cache miss)
K->>K: store actiondfs_cached_dir in VM-lifetime hash table
K->>K: materialize "foo.cc" node from cached child record
K->>CAS: backing_file_open(sha256=...) for foo.cc
K-->>Ex: fd pointing at actiondfs inode (backing = CAS blob)
Ex->>K: mmap(fd) → backing_file_mmap → CAS page cache
```
---
## Build Integration
`Kconfig` declares `CONFIG_ACTIONDFS_FS` as a `bool` (not a tristate), so the module is always compiled in or always absent — it cannot be a loadable module. `Makefile` generates `actiondfs.o` when the config symbol is set. The `BUILD.bazel` `srcs` filegroup is consumed by `linux.bzl` which drives the kernel build from within Bazel. The actual kernel image with `CONFIG_ACTIONDFS_FS=y` is built as `//vm:linux_kernel_zst`.
The Linux host path (`linux-actiond` without a VM) never sets `use_actiondfs`; it continues to use read-only bind mounts from the CAS. `actiondfs` is only present in the Bazel-built VM kernel, so ordinary host kernels do not need the module.
Sources: [kernel/actiondfs/Kconfig:1-6](), [kernel/actiondfs/Makefile:1](), [ARCHITECTURE.md]()
---
## Summary
`actiondfs` eliminates per-file copies and bind-mount overhead for VM action inputs by implementing a minimal Linux filesystem that keeps all directory metadata in two VM-lifetime content-addressed caches (a 4096-bucket Directory hash table and a 16384-entry blob-path RCU cache) and delegates all file data operations to the underlying CAS blob through the kernel `backing_file_*` API. For most actions it is mounted directly at `/workspace` with a stage path for output capture. For input-mutating actions it is mounted as an overlayfs lowerdir so stock overlayfs handles copy-up without any changes to actiondfs internals. Observable behavior is exposed through `atomic64_t` counters at `/proc/actiondfs_stats`.
---
## 06. Build System, Runtime Images & Testing
> How the full repo is built with Bazel and rules_zig, how glibc runtime SquashFS images are packaged and selected via the libc platform property, how standalone binaries embed compressed kernel and initramfs payloads, and the layered testing strategy: unit tests, Docker Linux e2e, VM e2e, and the LLVM tblgen smoke benchmark.
- Page Markdown: https://grok-wiki.com/public/wiki/hermeticbuild-actiond-796c0ee40e63/pages/06-build-system-runtime-images-testing.md
- Generated: 2026-05-25T17:45:37.676Z
### Source Files
- `MODULE.bazel`
- `BUILD.bazel`
- `vm/BUILD.bazel`
- `runtimes/BUILD.bazel`
- `runtimes/glibc_runtime_repo.bzl`
- `tools/e2e.sh`
- `e2e/run_llvm_vm_smoke.sh`
- `e2e/llvm_tblgen_smoke.sh`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [MODULE.bazel](MODULE.bazel)
- [BUILD.bazel](BUILD.bazel)
- [vm/BUILD.bazel](vm/BUILD.bazel)
- [runtimes/BUILD.bazel](runtimes/BUILD.bazel)
- [runtimes/glibc_runtime_repo.bzl](runtimes/glibc_runtime_repo.bzl)
- [src/BUILD.bazel](src/BUILD.bazel)
- [src/embedded_payload.zig](src/embedded_payload.zig)
- [cmd/darwin_actiond/BUILD.bazel](cmd/darwin_actiond/BUILD.bazel)
- [cmd/linux_actiond/BUILD.bazel](cmd/linux_actiond/BUILD.bazel)
- [cmd/linux_actiond_guest/BUILD.bazel](cmd/linux_actiond_guest/BUILD.bazel)
- [platforms/BUILD.bazel](platforms/BUILD.bazel)
- [e2e/BUILD.bazel](e2e/BUILD.bazel)
- [test/BUILD.bazel](test/BUILD.bazel)
- [tools/e2e.sh](tools/e2e.sh)
- [e2e/llvm_tblgen_smoke.sh](e2e/llvm_tblgen_smoke.sh)
- [e2e/run_llvm_vm_smoke.sh](e2e/run_llvm_vm_smoke.sh)
- [.bazelrc](.bazelrc)
</details>
# Build System, Runtime Images & Testing
This page explains how `hermeticbuild/actiond` is built with Bazel and `rules_zig`, how glibc runtime SquashFS images are assembled and selected at execution time, how standalone binaries embed compressed payloads, and the full layered testing strategy from unit tests through the LLVM tblgen VM smoke benchmark.
Understanding these layers together is important because a change in any one area—kernel config, runtime image content, or embedded payload format—affects all three others. The build, packaging, and test systems are tightly coupled around the same artifacts.
---
## Build System
### Bazel with rules_zig
The entire repository is built with Bazel. The primary language is Zig, configured via [`rules_zig`](https://github.com/aherrmann/rules_zig). The Zig toolchain is pinned to version `0.16.0` and is declared to require the `//platforms:local_execution` constraint, ensuring Zig compile actions are never sent to a remote executor.
```python
# MODULE.bazel
bazel_dep(name = "rules_zig", version = "0.15.1")
zig.toolchain(
extra_exec_compatible_with = ["//platforms:local_execution"],
zig_version = "0.16.0",
)
```
The `//platforms:local_execution_platform` platform carries `exec_properties = {"no-remote-exec": "1"}` and is registered as an execution platform.
Sources: [MODULE.bazel:3-6](), [platforms/BUILD.bazel:21-31]()
### Core Bazel dependencies
| Dependency | Purpose |
|---|---|
| `rules_zig` | Zig toolchain, `zig_binary`, `zig_library`, `zig_test` |
| `linux.bzl` | VM kernel build (`linux()` macro, compact pre-built repos) |
| `llvm` | `llvm-objcopy`, `llvm-strip`, `llvm-tblgen` and LLVM toolchains |
| `zstd` | Zstandard compression (patched to remove `pthread` dependency) |
| `apple_support` / Obj-C | macOS `Virtualization.framework` bridge in `cmd/darwin_actiond` |
| `bazel_lib` | `platform_transition_binary` for cross-compiling guest binary |
| `rules_cc` / `rules_shell` | C/ObjC libraries, shell test rules |
Sources: [MODULE.bazel:3-12]()
### Remote config
`.bazelrc` defines a `--config=remote` flag that sets `--jobs=500` and adds `@llvm//:rbe_platform` as an extra execution platform. Large builds—especially kernel and LLVM packages—are recommended to use this flag:
```
common:remote --jobs=500
common:remote --extra_execution_platforms=@llvm//:rbe_platform
```
Sources: [.bazelrc:5-6]()
---
## VM Kernel
The Linux kernel is built by `linux.bzl` from source archive `linux-6.18.2` declared in `MODULE.bazel`. A custom kernel filesystem module, `actiondfs` (located at `kernel/actiondfs/`), is compiled in via `linux_kernel.extra_source`. The kernel uses `allnoconfig` with a custom `//vm:linux.config` to minimize size.
```python
# vm/BUILD.bazel
linux(
name = "linux_kernel",
compact_repos = {"aarch64": "@actiond_vm_compact"},
config = "@actiond_vm_kconfig//:kconfig",
config_mode = "allnoconfig",
extra_kconfigs = {"//kernel/actiondfs:Kconfig": "fs/actiondfs"},
extra_srcs = ["//kernel/actiondfs:srcs"],
image_format = "Image",
source_repo = "@linux_6_18_2",
)
```
The raw `Image` is then zstd-compressed to produce `//vm:linux_kernel_zst`. A pre-built compact repo (`@actiond_vm_compact`) is provided so that the kernel does not need to be built from source in typical CI runs.
### initramfs
The VM initramfs is built at `//vm:initramfs`. It is a `cpio.zst` archive containing a single statically-linked aarch64 binary—the guest agent `linux-actiond-guest-aarch64`—produced from `cmd/linux_actiond_guest`. The binary is cross-compiled to `//platforms:linux_aarch64` via `platform_transition_binary` and stripped with `llvm-strip`.
Sources: [vm/BUILD.bazel:1-42](), [cmd/linux_actiond_guest/BUILD.bazel]()
---
## Glibc Runtime SquashFS Images
### Repository rule: `glibc_deb_runtime`
Each versioned glibc runtime is declared using the `glibc_deb_runtime` repository rule from `runtimes/glibc_runtime_repo.bzl`. The rule:
1. Downloads `.deb` packages from Ubuntu's package archives (supporting `.xz`, `.zst`, and `.gz` payload compression).
2. Extracts `data.tar.*` into a `root/` directory.
3. Strips documentation, locales, man pages, and gconv tables to minimize image size.
4. Writes a `runtime_manifest.json` describing the runtime's name, architecture, source `.deb` URLs, mount points, and ELF interpreter paths.
```json
// runtime_manifest.json (generated)
{
"name": "glibc2.35",
"arch": "aarch64",
"debs": ["https://ports.ubuntu.com/...libc6_2.35-..._arm64.deb"],
"mounts": [
["root/lib", "/lib"],
["root/lib64", "/lib64"],
["root/usr/lib", "/usr/lib"],
["root/etc", "/etc"]
],
"interpreters": ["/lib/ld-linux-aarch64.so.1"]
}
```
Sources: [runtimes/glibc_runtime_repo.bzl:5-75]()
### Declared runtime versions
Six glibc runtimes are declared in `MODULE.bazel`:
| Name | glibc | Arch | Source |
|---|---|---|---|
| `glibc_2_31_aarch64` | 2.31 | aarch64 | Ubuntu Ports (focal) |
| `glibc_2_31_x86_64` | 2.31 | x86_64 | Ubuntu Archive (focal) |
| `glibc_2_35_aarch64` | 2.35 | aarch64 | Ubuntu Ports (jammy) |
| `glibc_2_35_x86_64` | 2.35 | x86_64 | Ubuntu Archive (jammy) |
| `glibc_2_39_aarch64` | 2.39 | aarch64 | Ubuntu Ports (noble) |
| `glibc_2_39_x86_64` | 2.39 | x86_64 | Ubuntu Archive (noble) |
Sources: [MODULE.bazel:93-148]()
### SquashFS packaging and architecture selection
`runtimes/BUILD.bazel` defines two `genrule` targets—`runtimes_squashfs_aarch64` and `runtimes_squashfs_x86_64`—that pack all three versioned runtimes for their respective architecture into a single SquashFS image using `//tools:sqfs_pack`. The image directory layout is:
```text
runtimes-tree/
├── common/
│ └── root/
│ ├── etc/hosts
│ └── etc/nsswitch.conf
└── libc/
├── glibc2.31/<arch>/root/ + runtime_manifest.json
├── glibc2.35/<arch>/root/ + runtime_manifest.json
└── glibc2.39/<arch>/root/ + runtime_manifest.json
```
The public `//runtimes:runtimes_squashfs` alias uses a `select()` to pick the correct architecture-specific image:
```python
# runtimes/BUILD.bazel
alias(
name = "runtimes_squashfs",
actual = select({
":target_aarch64": ":runtimes_squashfs_aarch64",
":target_arm64": ":runtimes_squashfs_aarch64",
":target_x86_64": ":runtimes_squashfs_x86_64",
"//conditions:default": ":runtimes_squashfs_aarch64",
}),
)
```
Sources: [runtimes/BUILD.bazel:1-30]()
### libc platform property
Clients select the desired glibc version at action dispatch time via the `libc` exec property. The stress e2e workspace uses:
```
--remote_default_exec_properties=libc=glibc2.35
```
The server reads this property, locates the corresponding subtree within the mounted SquashFS image, and applies the overlay mounts from `runtime_manifest.json` before executing the action.
Sources: [tools/e2e.sh:152-153]()
---
## Standalone Binary Packaging
For deployment without separate runtime files, `actiond` supports **standalone** binaries that have the kernel, initramfs, and runtimes SquashFS embedded directly inside the executable. Extraction logic lives in `src/embedded_payload.zig`.
### Darwin (macOS/Mach-O)
The `darwin-standalone-payload-sections` `cc_library` uses `-Wl,-sectcreate` linker flags to inject three Mach-O sections into the `__ACTIOND` segment:
```python
# cmd/darwin_actiond/BUILD.bazel
cc_library(
name = "darwin-standalone-payload-sections",
linkopts = [
"-Wl,-sectcreate,__ACTIOND,__kernel,$(location //vm:linux_kernel_zst)",
"-Wl,-sectcreate,__ACTIOND,__initramfs,$(location //vm:initramfs)",
"-Wl,-sectcreate,__ACTIOND,__runtimes,$(location //runtimes:runtimes_squashfs_aarch64)",
],
...
)
```
The resulting binary is codesigned with `darwin-actiond.entitlements`.
Sources: [cmd/darwin_actiond/BUILD.bazel:43-64]()
### Linux (ELF)
The Linux standalone binary uses `llvm-objcopy` to inject the runtimes SquashFS as an ELF section named `.actiond.runtimes`:
```python
# cmd/linux_actiond/BUILD.bazel
"$(execpath @llvm//tools:llvm-objcopy)" \
--add-section .actiond.runtimes="$(location //runtimes:runtimes_squashfs)" "$@"
```
Sources: [cmd/linux_actiond/BUILD.bazel:13-26]()
### Payload extraction (`src/embedded_payload.zig`)
At runtime, `embedded_payload.zig` reads the binary's own on-disk format:
- **Mach-O**: walks `LC_SEGMENT_64` commands to find sections in `__ACTIOND` by name.
- **ELF**: parses the section header table to locate `.actiond.runtimes`.
Extracted payloads are written to `<root>/embedded/<name>-<sha256hex>` with content-addressed deduplication: if a file of the correct size already exists at that path, extraction is skipped.
```
const mach_o_segment_name = "__ACTIOND";
const mach_o_kernel_section = "__kernel";
const mach_o_initramfs_section = "__initramfs";
const mach_o_runtimes_section = "__runtimes";
const elf_runtimes_section = ".actiond.runtimes";
```
Sources: [src/embedded_payload.zig:10-21](), [src/embedded_payload.zig:43-105]()
---
## Artifact Relationships
```text
MODULE.bazel (glibc_deb_runtime declarations)
│
▼
@glibc_2_3{1,5,9}_{aarch64,x86_64}
.deb download → root/ tree + runtime_manifest.json
│
▼
runtimes/BUILD.bazel → runtimes_squashfs_{aarch64,x86_64}.sqfs
│
┌──────────────────┼───────────────────────────┐
│ │ │
vm/BUILD.bazel cmd/linux_actiond cmd/darwin_actiond
(vm_bundle) (standalone: ELF section) (standalone: Mach-O section)
├── linux_kernel_zst
├── initramfs.cpio.zst
└── runtimes_squashfs
```
---
## Testing Strategy
### Layer 1: Unit tests
Unit tests cover the Zig library source directly. The `zig_test` target in `src/BUILD.bazel` shares the same source list as the `zig_library`:
```
bazel test //src:unit_tests
```
`embedded_payload.zig` contains inline tests for Mach-O extraction, ELF extraction, and the null case where a binary has no embedded payloads.
Sources: [src/BUILD.bazel:30-63](), [src/embedded_payload.zig:262-410]()
### Layer 2: Repository build checks
`tools/e2e.sh build` runs the full build and test sweep:
```bash
run_bazel test //src:unit_tests
run_bazel build //cmd/linux_actiond_guest:linux-actiond-guest-aarch64
run_bazel build //vm:initramfs
run_bazel build //runtimes:runtimes_squashfs
run_bazel build //vm:linux_kernel --nobuild
run_bazel build //tools:e2e_action_tool_linux_{aarch64,x86_64}
run_bazel build //...
run_bazel test //...
```
Sources: [tools/e2e.sh:175-184]()
### Layer 3: Docker Linux e2e
For macOS developers who need to validate the Linux code path, `tools/docker/run_linux_e2e.sh` wraps the `linux` mode inside a Docker container. `tools/e2e.sh linux` itself requires a Linux host.
### Layer 4: Stress workspace e2e
The `test/` directory is a standalone Bazel workspace. The `stress_workload` macro generates:
- Bare file inputs (default 160)
- Source directory inputs (8 directories × 32 files)
- Nested individual file inputs (8 groups × 96 files)
- Multiple actions sharing the same output directory (to measure tree-artifact reuse)
The e2e harness builds a platform-specific `e2e_action_tool` binary, copies it into `test/tool/action-tool`, then executes:
```bash
bazel build //:stress_all \
--remote_executor="grpc://${endpoint}" \
--remote_cache="grpc://${endpoint}" \
--remote_default_exec_properties=libc=glibc2.35 \
--noremote_accept_cached \
--spawn_strategy=remote \
--genrule_strategy=remote
```
Sources: [tools/e2e.sh:143-170](), [test/BUILD.bazel]()
#### Linux mode
Starts `linux-actiond serve --listen=... --root=... --runtime-image=<sqfs>` on the local Linux host, then runs the stress workspace against it. Standalone mode passes no `--runtime-image`; the binary self-extracts.
Sources: [tools/e2e.sh:186-230]()
#### VM mode
macOS-only. Creates an ext4 disk image for the CAS (`//server/cas.ext4`), then starts:
```bash
darwin-actiond serve-vm \
--listen=... --root=... \
--kernel=Image.zst --initramfs=initramfs.cpio.zst \
--runtime-image=runtimes.sqfs \
--cas-image=cas.ext4 \
--memory-mib=1024 --cpus=4
```
Waits up to 90 seconds for the port to become ready, then runs the stress workspace.
Sources: [tools/e2e.sh:232-284]()
### Layer 5: LLVM tblgen smoke benchmark
The most realistic and performance-significant test builds `@llvm-project//llvm:llvm-tblgen` (≈5,341 configured actions) via the VM executor. The workflow is split into two phases to isolate actiondfs performance:
```
Phase 1: Warmup (--noremote_accept_cached)
Build //e2e:llvm_exec_warmup [wraps llvm-min-tblgen; ~2,403 actions]
→ populates the VM CAS and action cache
Phase 2: Measured (--remote_accept_cached)
Build @llvm-project//llvm:llvm-tblgen
→ only the incremental actions beyond warmup are executed from scratch
```
All builds use `@llvm//platforms:linux_arm64_musl` as both target and host platform, producing Linux aarch64 musl binaries. This avoids glibc runtime actions and allows exec tools to run inside the VM guest.
`--noremote_cache_compression` is required because actiond does not yet implement remote cache compression.
```bash
# e2e/llvm_tblgen_smoke.sh (inner smoke)
bazel build @llvm-project//llvm:llvm-tblgen \
--platforms=@llvm//platforms:linux_arm64_musl \
--host_platform=@llvm//platforms:linux_arm64_musl \
--remote_executor=grpc://127.0.0.1:8998 \
--noremote_cache_compression \
--noremote_accept_cached \
--spawn_strategy=remote
```
Sources: [e2e/llvm_tblgen_smoke.sh:30-73]()
#### run_llvm_vm_smoke.sh
`e2e/run_llvm_vm_smoke.sh` automates the full before/after cycle:
1. Builds the `darwin-actiond-standalone` server package with `--config=remote`.
2. Creates a fresh ext4 CAS image.
3. Starts the VM and waits for the guest to be ready (polls `actiondfs_stats.txt`).
4. Runs `llvm_tblgen_smoke.sh` and records the measured server log slice.
5. Parses timing data via `test/parse_timings.py` and writes a Markdown summary.
6. Optionally runs a mac-host local baseline (same target, no remote executor) for comparison.
7. Records the output root at `/tmp/actiond-last-llvm-vm-smoke-path`.
Sources: [e2e/run_llvm_vm_smoke.sh:1-55](), [e2e/run_llvm_vm_smoke.sh:57-130]()
### Testing summary
| Level | Command | What it exercises |
|---|---|---|
| Unit tests | `bazel test //src:unit_tests` | Zig library correctness, payload extraction |
| Build checks | `tools/e2e.sh build` | All targets build; no broken deps |
| Linux e2e | `tools/docker/run_linux_e2e.sh` | Linux chroot executor + runtime mount |
| VM e2e | `tools/e2e.sh vm` | Virtualization.framework VM, vsock, actiondfs, CAS |
| Standalone e2e | `ACTIOND_E2E_STANDALONE=1 tools/e2e.sh vm` | Self-extracting payload at startup |
| LLVM smoke | `e2e/run_llvm_vm_smoke.sh` | End-to-end REAPI performance against real LLVM build |
---
## Closing Summary
The actiond build pipeline is organized as a single Bazel workspace that produces kernel, initramfs, runtime images, and host-side server binaries from hermetic, content-addressed inputs. Glibc runtimes are downloaded from Ubuntu `.deb` archives, packed into SquashFS images with per-version directory trees, and selected at build time by CPU architecture and at execution time by the `libc` exec property. Standalone binaries carry all three payloads (kernel, initramfs, runtimes) as native binary sections—Mach-O `__ACTIOND` segment on macOS, `.actiond.runtimes` ELF section on Linux—extracted on first run to a content-addressed cache within the server root. The testing strategy layers Zig unit tests, full repo build validation, Docker-wrapped Linux e2e, macOS VM e2e (using Virtualization.framework), and a two-phase LLVM tblgen smoke benchmark that serves as the canonical performance regression gate.
---