# macOS VM Execution — darwin-actiond, vsock & Guest Worker

> How darwin-actiond serve-vm boots a minimal arm64 Linux VM via Virtualization.framework, proxies REAPI traffic over virtio-vsock to linux-actiond-guest, manages the guest-owned ext4 CAS on a virtio block device, and handles standalone binary payload extraction at startup.

- Repository: hermeticbuild/actiond
- GitHub: https://github.com/hermeticbuild/actiond
- Human wiki: https://grok-wiki.com/public/wiki/hermeticbuild-actiond-796c0ee40e63
- Complete Markdown: https://grok-wiki.com/public/wiki/hermeticbuild-actiond-796c0ee40e63/llms-full.txt

## Source Files

- `src/darwin_vm.zig`
- `src/darwin_vm_host.zig`
- `src/grpc_vsock_bridge.zig`
- `src/vsock.zig`
- `src/guest_init.zig`
- `src/guest_worker.zig`
- `src/control_protocol.zig`
- `src/embedded_payload.zig`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [cmd/darwin_actiond/vz_bridge.m](cmd/darwin_actiond/vz_bridge.m)
- [src/darwin_vm.zig](src/darwin_vm.zig)
- [src/darwin_vm_host.zig](src/darwin_vm_host.zig)
- [src/embedded_payload.zig](src/embedded_payload.zig)
- [src/grpc_vsock_bridge.zig](src/grpc_vsock_bridge.zig)
- [src/vsock.zig](src/vsock.zig)
- [src/control_protocol.zig](src/control_protocol.zig)
- [src/control_transport_fd.zig](src/control_transport_fd.zig)
- [src/guest_init.zig](src/guest_init.zig)
- [src/guest_worker.zig](src/guest_worker.zig)
</details>

# macOS VM Execution — darwin-actiond, vsock & Guest Worker

`darwin-actiond serve-vm` is the macOS execution path for actiond. It boots a minimal arm64 Linux virtual machine using Apple's [Virtualization.framework](https://developer.apple.com/documentation/virtualization), proxies REAPI (Remote Execution API) gRPC traffic from TCP listeners on the host into the VM over virtio-vsock, and lets a guest `linux-actiond` process own the CAS and execute build actions natively on Linux. This design keeps the Linux execution environment hermetic and avoids any dependency on macOS for action sandboxing or filesystem isolation, while still providing the REAPI surface at a host TCP address that Bazel or other clients connect to directly.

The overall execution split is: the macOS host manages VM lifecycle, payload extraction, and bidirectional byte-pumping; the Linux guest owns the REAPI ActionCache, CAS (on a dedicated ext4 block device), and all action execution via `linux-actiond --guest-worker`.

---

## Architecture Overview

```text
┌────────────────────────── macOS host ────────────────────────────────────────┐
│  darwin-actiond serve-vm                                                      │
│                                                                               │
│  ┌──────────────┐   ┌──────────────────────────────────────────────────┐     │
│  │  TCP listener│   │  Virtualization.framework VM (arm64 Linux)       │     │
│  │  :8980 (gRPC)│<->│  ┌───────────────────────────────────────────┐   │     │
│  └──────────────┘   │  │  virtio-vsock port 5001  (gRPC HTTP/2)    │   │     │
│   grpc_vsock_bridge │  │  virtio-vsock port 5000  (control channel)│   │     │
│  ┌──────────────┐   │  ├───────────────────────────────────────────┤   │     │
│  │ control_     │   │  │  linux-actiond --guest-worker             │   │     │
│  │ transport_fd │<->│  │   • REAPI Execute / CAS / ActionCache      │   │     │
│  └──────────────┘   │  │   • action executor + actiondfs           │   │     │
│                     │  ├───────────────────────────────────────────┤   │     │
│  Block devices:     │  │  /cas  (ext4 virtio block, R/W)           │   │     │
│   cas.ext4   ───────│->│  /runtimes (squashfs virtio block, R/O)   │   │     │
│   runtimes.sqfs ───-│->│  /work, /tmp (tmpfs)                      │   │     │
│                     │  └───────────────────────────────────────────┘   │     │
│                     └──────────────────────────────────────────────────┘     │
└──────────────────────────────────────────────────────────────────────────────┘
```

---

## `serve` Startup Sequence

`darwin_vm_host.serve` orchestrates everything before the VM is started. The function is the single entry point called by the `darwin-actiond serve-vm` subcommand.

### 1. Working Directory

A root working directory (default `/tmp/actiond-vm`, overridable with `--root`) is created. All extracted artifacts, decompressed boot files, and the default CAS image file live under this root.

Sources: [src/darwin_vm_host.zig:121-130]()

### 2. CAS ext4 Image

The CAS image file is resolved (defaulting to `<root>/cas.ext4`). If the file does not already exist, it is created as a sparse file of the configured size (default 32 GiB). The guest init process mounts this file as an ext4 block device at `/cas`; the format must already be present (the host never calls `mkfs`—the image must be pre-formatted or already used from a prior run).

```zig
// src/darwin_vm_host.zig:136-140
const owned_cas_image_path = if (options.cas_image == null)
    try std.fs.path.join(allocator, &.{ options.root, "cas.ext4" })
else "";
...
try ensureCasImageFile(io, cas_image_path, options.cas_image_size_mib);
```

Sources: [src/darwin_vm_host.zig:133-148](), [src/darwin_vm_host.zig:308-332]()

### 3. Embedded Payload Extraction

When `--kernel`, `--initramfs`, or `--runtime-image` are not explicitly provided, `serve` calls `embedded_payload.extractFromSelf` to pull these artifacts out of the running `darwin-actiond` binary itself. On macOS the binary is a Mach-O and the payloads are stored in a dedicated `__ACTIOND` segment with sections `__kernel`, `__initramfs`, and `__runtimes`. On Linux the runtimes payload is stored in a `.actiond.runtimes` ELF section.

| Artifact | Mach-O section | Payload constant |
|---|---|---|
| Linux kernel | `__ACTIOND,__kernel` | `embedded_payload.kernel_name` = `"linux_kernel"` |
| Initramfs | `__ACTIOND,__initramfs` | `embedded_payload.initramfs_name` = `"initramfs.cpio.zst"` |
| Runtime squashfs | `__ACTIOND,__runtimes` | `embedded_payload.runtimes_name` = `"runtimes.sqfs"` |

Each payload is extracted once: a SHA-256 hash of the binary range is computed and the extracted file is cached at `<root>/embedded/<name>-<hash>` so subsequent starts skip the copy. The extract is followed by an `fsync` to guarantee the file is durable before use.

Sources: [src/embedded_payload.zig:1-45](), [src/embedded_payload.zig:75-130]()

### 4. Boot Artifact Decompression

Both the kernel and initramfs may be stored in zstd-compressed form (magic bytes `0x28 0xb5 0x2f 0xfd`). `prepareBootKernel` and `prepareBootInitramfs` detect the magic, decompress via `ZSTD_decompress`, and write a raw file to `<root>/boot/<artifact>-<sha256>.<ext>`. The raw file is reused on subsequent starts if already present. The Virtualization.framework `VZLinuxBootLoader` requires the uncompressed ARM64 `Image` format and an uncompressed CPIO initramfs.

```
Stored as:  linux_kernel      (zstd-compressed)
Served as:  boot/kernel-<sha256>.Image  (raw, written once)
```

Sources: [src/darwin_vm_host.zig:258-305](), [src/darwin_vm_host.zig:355-420]()

---

## Virtualization.framework Bridge (`vz_bridge.m`)

The actual VM creation is done in Objective-C via `cmd/darwin_actiond/vz_bridge.m`. This file is the boundary between Zig and Virtualization.framework and exposes three C-callable functions that `darwin_vm.zig` declares as `extern`.

### VM Configuration

`actiond_vm_start` assembles a `VZVirtualMachineConfiguration` with:

| Component | Value / API |
|---|---|
| Boot loader | `VZLinuxBootLoader` with kernel, initramfs, cmdline `console=hvc0 init=/init panic=-1 quiet` |
| CPU count | Configurable (`--cpus`, default 2) |
| Memory | Configurable (`--memory-mib`, default 512 MiB) |
| CAS block device | `VZVirtioBlockDeviceConfiguration` (read-write, from `cas.ext4`) |
| Runtime block device | `VZVirtioBlockDeviceConfiguration` (read-only, from `runtimes.sqfs`), omitted if not provided |
| vsock | `VZVirtioSocketDeviceConfiguration` (single device) |
| Serial console | `VZVirtioConsoleDeviceSerialPortConfiguration` forwarded to host stderr |
| Network | None (`networkDevices = @[]`) |
| Entropy | `VZVirtioEntropyDeviceConfiguration` |
| Platform | `VZGenericPlatformConfiguration` |

The VM runs on a private `dispatch_queue_t` named `dev.actiond.vm`. Start is synchronous with a configurable timeout (default 30 s).

Sources: [cmd/darwin_actiond/vz_bridge.m:59-156]()

### vsock Connection

`actiond_vm_connect(handle, port, timeout_ms)` dispatches `connectToPort:completionHandler:` on `VZVirtioSocketDevice`, waits on a dispatch semaphore, and returns a `dup()`-ed file descriptor to the caller. This means each connection request from the Zig side becomes an OS-level file descriptor that can be read/written with ordinary POSIX calls.

```objc
// cmd/darwin_actiond/vz_bridge.m:177-195
VZVirtioSocketDevice *socketDevice = (VZVirtioSocketDevice *)socketDevices[0];
[socketDevice connectToPort:port completionHandler:^(VZVirtioSocketConnection *connection, NSError *errorOrNil) {
    ...
    result_fd = dup(fd);
    [connection close];
    dispatch_semaphore_signal(semaphore);
}];
```

Sources: [cmd/darwin_actiond/vz_bridge.m:157-220]()

---

## Zig VM Wrapper (`darwin_vm.zig`)

`darwin_vm.Machine` wraps the opaque handle returned by `actiond_vm_start`. It is macOS-only at compile time (`builtin.os.tag != .macos` returns `error.UnsupportedHost` from `start`). It exposes:

- `Machine.start(options)` — calls `actiond_vm_start` with absolute paths; returns a `Machine` value.
- `Machine.connectControlPort(port)` — retry loop calling `actiond_vm_connect` with per-attempt and total timeouts. Sleeps 100 ms between attempts if the guest is not ready yet.
- `Machine.opener()` — returns a `control_transport_fd.Opener` that connects to `vsock.control_port` (5000) for the internal control channel.
- `Machine.deinit()` — calls `actiond_vm_stop` then `actiond_vm_release`.

Sources: [src/darwin_vm.zig:21-113]()

---

## vsock Port Layout

The guest `linux-actiond` listens on two virtio-vsock ports, declared as constants in `vsock.zig`:

| Port | Constant | Purpose |
|---|---|---|
| **5000** | `vsock.control_port` | Internal control channel (actiondfs stats, future host→guest RPCs) |
| **5001** | `vsock.grpc_port` | Raw gRPC HTTP/2 — forwarded from the host TCP listener |

The guest-side `vsock.Listener` and `vsock.Connection` are Linux-only (`AF.VSOCK`, `SOCK.STREAM`). The host side uses `darwin_vm.Machine.connectControlPort(port)` which calls into Virtualization.framework's vsock connect API.

Sources: [src/vsock.zig:1-35]()

---

## gRPC–vsock Bridge (`grpc_vsock_bridge.zig`)

`grpc_vsock_bridge.serve(io, listen, machine)` is the host-side TCP listener for all REAPI traffic. It:

1. Listens on a TCP address (default `127.0.0.1:8980`).
2. For each accepted TCP connection, spawns a detached thread via `connectionThread`.
3. `connectionThread` opens a vsock connection to the guest at `vsock.grpc_port` (5001) via `machine.connectControlPort(vsock.grpc_port)`.
4. Two sub-threads pump bytes bidirectionally: `client→guest` and `guest→client`, each in a `pumpAndShutdown` loop using a 64 KiB buffer.
5. Each direction sends a TCP `shutdown(SHUT_WR)` when its read side sees EOF, allowing clean half-close.
6. After both pump threads join, connection timing is logged (`vm bridge timing` log lines) if the connection transferred ≥ 64 KiB or lasted ≥ 10 ms.

This bridge is transparent: the guest sees ordinary gRPC HTTP/2 frames; the Bazel client sees an ordinary TCP endpoint. No gRPC or HTTP/2 parsing is done in the bridge.

```zig
// src/grpc_vsock_bridge.zig:35-50
const guest_fd = machine.connectControlPort(vsock.grpc_port) catch |err| { ... };
const client_to_guest = std.Thread.spawn(.{}, pumpAndShutdown, .{ client_fd, guest_fd, &client_to_guest_stats });
const guest_to_client = std.Thread.spawn(.{}, pumpAndShutdown, .{ guest_fd, client_fd, &guest_to_client_stats });
```

Sources: [src/grpc_vsock_bridge.zig:19-95]()

---

## Control Channel & Protocol (`control_protocol.zig`, `control_transport_fd.zig`)

### Wire Format

The control protocol uses a simple binary framing over the vsock connection on port 5000. Every message has an 18-byte header:

| Bytes | Field | Notes |
|---|---|---|
| 0–3 | Magic | ASCII `ACTD` |
| 4 | Version | `1` |
| 5 | Tag | Encodes `CallKind` (request) or `Status` (response) |
| 6–9 | `method_len` | `u32` little-endian |
| 10–17 | `body_len` | `u64` little-endian |

The payload immediately follows the header: `method_len` bytes of method string, then `body_len` bytes of body. Currently only `CallKind.unary` (tag `0`) is defined.

Sources: [src/control_protocol.zig:1-55]()

### Host-Side Client (`control_transport_fd.Client`)

`control_transport_fd.Client` maintains a pool of up to 32 cached vsock file descriptors (one per slot). Each `call()` acquires a slot lock, lazily opens a connection via the `Opener` if needed, writes a request frame, and reads the response. On error the slot's cached fd is closed and discarded (no silent reconnect on the next call from the same slot, but the slot is freed for re-use).

Sources: [src/control_transport_fd.zig:30-100]()

### Currently Supported Method

The only method dispatched over the control channel is:

```
/actiond.internal.Guest/ActiondfsStats
```

This method has an empty body request. The guest responds with the contents of `/proc/actiondfs_stats` (the actiondfs kernel module's procfs counter file) plus CAS put-file stats appended from `cas.appendPutFileStats`.

Sources: [src/control_protocol.zig:13](), [src/guest_worker.zig:130-170]()

---

## Guest Init (`guest_init.zig`)

`guest_init.run()` is the Linux PID 1 (`/init`) process executed inside the VM. It is Linux-only at compile time. Its responsibilities:

1. Mount core virtual filesystems in order:

| Source | Target | FS type |
|---|---|---|
| `proc` | `/proc` | `proc` |
| `sysfs` | `/sys` | `sysfs` |
| `cgroup2` | `/sys/fs/cgroup` | `cgroup2` |
| `devtmpfs` | `/dev` | `devtmpfs` |
| `tmpfs` | `/tmp` | `tmpfs` |
| `tmpfs` | `/work` | `tmpfs` |

2. Mount the CAS ext4 block device at `/cas`. The init scans a fixed list of candidate devices (`/dev/vda`, `/dev/vdb`, `/dev/vdc`, `/dev/vdd`, `/dev/sda`, `/dev/sdb`, `/dev/nvme0n1`) then falls back to reading `/sys/block`. It retries up to 50 times with 100 ms waits. Mount flags: `MS_NOSUID | MS_NODEV | MS_NOATIME`, data option `errors=remount-ro`.

3. Mount the runtime squashfs image at `/runtimes` (read-only). Same candidate scan and retry logic.

4. `exec` into `linux-actiond --guest-worker` via `std.process.replace`. The init process is replaced in-place, so the guest worker runs as PID 1 after mount completion.

```zig
// src/guest_init.zig:54
pub const worker_argv = [_][]const u8{ "/actiond", "--guest-worker" };
```

Sources: [src/guest_init.zig:40-75](), [src/guest_init.zig:75-175]()

---

## Guest Worker (`guest_worker.zig`)

`guest_worker.run()` is the Linux-only REAPI server inside the VM. It:

1. Opens `/cas` and `/work/actions` directories.
2. Initialises the CAS layout (`cas.Store.ensureLayout`) and ActionCache under `/cas/ac`.
3. Cleans and re-creates the actiondfs staging directory at `/cas/actiondfs-stage`.
4. Prepares action execution options using the guest-native CAS paths:
   - `cas_blob_root_path = "/cas/blobs/sha256"`
   - `actiondfs_stage_root_path = "/cas/actiondfs-stage"`
   - `runtime_root_path = "/runtimes"`
5. Listens on `vsock.control_port` (5000) for control-channel connections and accepts them in a loop, dispatching each in a detached thread.
6. Spawns a detached gRPC thread that listens on `vsock.grpc_port` (5001) and serves full REAPI HTTP/2 traffic (Execute, CAS, ActionCache, ByteStream, Capabilities) through `grpc_http2_server.handleConnectionFd`.

```zig
// src/guest_worker.zig:44-50
const listener = try vsock.listen(vsock.control_port);
...
const grpc_thread = try std.Thread.spawn(.{}, grpcListenerThread, .{ io, allocator, server });
```

Sources: [src/guest_worker.zig:21-65](), [src/guest_worker.zig:67-100]()

---

## actiondfs Stats Polling

When `--actiondfs-stats-path` is provided to `serve-vm`, `darwin_vm_host.serve` spawns a background thread that calls `writeActiondfsStatsSnapshot` once per second. Each snapshot:

1. Sends a `control_protocol.Request` with method `/actiond.internal.Guest/ActiondfsStats` over the control channel (vsock port 5000).
2. Receives the response body (contents of guest `/proc/actiondfs_stats`).
3. Writes the body to the specified path on the host filesystem.

This provides a host-visible window into the guest actiondfs kernel module's lifetime counters (directory cache hits/misses, CAS blob opens, splice reads, mmap calls, stale retry events, etc.) without requiring any host filesystem access from within the guest.

Sources: [src/darwin_vm_host.zig:192-230](), [src/guest_worker.zig:128-165]()

---

## Standalone Binary Mode

When running without explicit `--kernel`, `--initramfs`, and `--runtime-image` flags, `darwin-actiond serve-vm` can be a self-contained binary. The build toolchain embeds all three artifacts as named sections inside the Mach-O binary (`__ACTIOND` segment). At startup `embedded_payload.extractFromSelf` parses the running executable's Mach-O load commands, locates each section by name, SHA-256 hashes the byte range, and copies it out to `<root>/embedded/<name>-<hash>` only if not already present. A matching size check on re-entry skips the copy on warm starts.

The same mechanism works for Linux ELF binaries for the `runtimes.sqfs` payload (in `.actiond.runtimes`). The kernel and initramfs sections are not embedded in ELF binaries because guest-side binaries do not boot VMs.

Sources: [src/embedded_payload.zig:47-130](), [src/darwin_vm_host.zig:143-170]()

---

## Sequence: Client REAPI Request End-to-End

```mermaid
sequenceDiagram
    participant Bazel as Bazel client (TCP)
    participant Bridge as grpc_vsock_bridge (macOS)
    participant VZ as Virtualization.framework vsock
    participant Guest as linux-actiond --guest-worker
    participant CAS as /cas (ext4 block)

    Bazel->>Bridge: TCP connect :8980
    Bridge->>VZ: connectToPort(5001)
    VZ-->>Bridge: fd (dup of vsock connection)
    Bazel->>Bridge: gRPC HTTP/2 frame(s)
    Bridge->>VZ: write bytes (64 KiB buffer)
    VZ->>Guest: vsock read
    Guest->>CAS: CAS lookup / write
    CAS-->>Guest: blob data
    Guest-->>VZ: gRPC HTTP/2 response
    VZ-->>Bridge: read bytes
    Bridge-->>Bazel: TCP write
    Bazel->>Bridge: TCP EOF (SHUT_WR)
    Bridge->>VZ: shutdown(SHUT_WR)
    VZ->>Guest: vsock EOF
    Guest-->>VZ: shutdown(SHUT_WR)
    VZ-->>Bridge: read EOF
    Bridge-->>Bazel: shutdown(SHUT_WR)
```

---

## Configuration Reference

All flags are parsed by `darwin_vm_host.parseServeVmArgs`. Every flag supports both `--flag value` and `--flag=value` forms.

| Flag | Default | Purpose |
|---|---|---|
| `--listen` | `127.0.0.1:8980` | TCP address for the host-side gRPC bridge |
| `--root` | `/tmp/actiond-vm` | Working directory for artifacts and extracted payloads |
| `--cas-image` | `<root>/cas.ext4` | Path to the ext4 CAS disk image |
| `--cas-image-size-mib` | `32768` (32 GiB) | Size when creating a new CAS image |
| `--kernel` | *(embedded)* | Path to raw or zstd-compressed arm64 Linux Image |
| `--initramfs` | *(embedded)* | Path to raw or zstd-compressed CPIO initramfs |
| `--runtime-image` | *(embedded)* | Path to squashfs runtimes image |
| `--memory-mib` | `512` | Guest RAM in MiB |
| `--cpus` | `2` | Guest vCPU count |
| `--start-timeout-ms` | `30000` | VM start timeout (ms) |
| `--connect-timeout-ms` | `60000` | Total vsock connect timeout (ms) |
| `--actiondfs-stats-path` | *(disabled)* | Host path for periodic actiondfs stats snapshots |

Sources: [src/darwin_vm_host.zig:27-50](), [src/darwin_vm_host.zig:52-120]()

---

## Summary

`darwin-actiond serve-vm` is a thin macOS host process whose main job is VM lifecycle management and byte forwarding. It uses Virtualization.framework (via an Objective-C bridge in `vz_bridge.m`) to boot a minimal arm64 Linux guest with a writable ext4 CAS disk and an optional read-only squashfs runtime image, both attached as virtio block devices. The host exposes two vsock ports: port 5001 carries raw gRPC HTTP/2 frames pumped transparently between the Bazel TCP client and the guest `linux-actiond` worker, while port 5000 carries an internal `ACTD`-framed control protocol used for host-initiated queries such as actiondfs stats collection. Inside the guest, a minimal PID-1 init mounts all required filesystems and `exec`s into `linux-actiond --guest-worker`, which then owns all REAPI state for the lifetime of the VM. In standalone binary mode the kernel, initramfs, and runtimes squashfs are extracted from Mach-O sections at first run, enabling a single self-contained executable to boot the full execution environment.
