vmsilo/CLAUDE.md
Davíð Steinn Geirsson ff3e4df0ba feat: run VM and vm-switch services under per-service dynamic users
Replace root execution with DynamicUser=yes for VM services (vmsilo-<name>)
and vm-switch daemons (vm-switch-<netname>). Console relay and proxy services
run as the configured desktop user. Privileged ExecStartPre=+ scripts handle
ACLs, VFIO chown, and TAP ownership. Socket paths move to per-VM subdirs
(/run/vmsilo/<name>/).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 21:51:02 +00:00

15 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

vmsilo is a lightweight virtualization system inspired by Qubes OS. It runs isolated VMs using crosvm (Chrome OS VMM) with different security domains (banking, shopping, untrusted, etc.). VMs are configured declaratively via a NixOS module.

Environment

You are running under NixOS. If you need any tools not in the environment, use nix-shell.

Development rules and guidelines

  • Do not commit design or implementation plans to git.
  • Update documentation (README.md and CLAUDE.md) along with code. Keep these concise.

Build Commands

# Build the default rootfs image
nix build .#

Code Style

This project uses treefmt-nix with nixfmt for formatting. Run before committing:

# Format all Nix files
nix fmt

There are no tests in this project.

Architecture

VM Launch Flow (NixOS module)

Each VM runs under its own dynamic service user (vmsilo-<name>) via DynamicUser=yes. A privileged ExecStartPre=+ script grants the dynamic user access to devices and sockets (ACLs, chown). Console relay and proxy services run as the configured desktop user.

VMs start automatically when first accessed via socket activation:

  1. vm-run banking firefox connects to /run/vmsilo/banking/command.socket
  2. Socket activation triggers vmsilo-banking@.service (proxy template)
  3. Proxy requires vmsilo-banking-vm.service, which starts crosvm
  4. Proxy waits for guest vsock:5000, then forwards command

The configured user can manage VM services via polkit (no sudo required for vm-start/vm-stop).

Key Files

  • rootfs-nixos/default.nix - NixOS-based rootfs builder (outputs qcow2, kernel, initrd)
  • rootfs-nixos/configuration.nix - Guest NixOS configuration (systemd, vsock listener, idle watchdog)
  • modules/options.nix - NixOS module interface (programs.vmsilo options)
  • modules/config.nix - NixOS module implementation (VM scripts, systemd units, networking)
  • patches/vmsilo-decorations-combined-v6.5.5.patch - KWin patch for VM window decoration colors
  • flake.nix - Flake exposing nixosModules.default and lib.makeRootfsNixos

Generated Scripts (NixOS module)

  • vm-run <name> <cmd> - Run command in VM (starts VM on-demand via socket activation)
  • vm-start <name> - Start VM via systemd (uses polkit, no sudo needed)
  • vm-stop <name> - Stop VM via systemd (uses polkit, no sudo needed)
  • vm-start-debug <name> - Start VM directly for debugging (requires sudo, bypasses socket activation)
  • vm-shell <name> - Connect to VM serial console (default) or SSH with --ssh

Bash completion: Enabled by default (enableBashIntegration = true). VM names are queried dynamically from systemd, so completions update in existing shells after nixos-rebuild switch.

vm-shell options:

  • Default: Connect to serial console (no SSH keys required). Escape with CTRL+].
  • --ssh: Use SSH over vsock (requires SSH keys configured in guestConfig)
  • --ssh --root: SSH as root

Note: SSH mode requires SSH keys configured in per-VM guestConfig:

guestConfig = {
  users.users.user.openssh.authorizedKeys.keys = [ "ssh-ed25519 AAAA..." ];
  users.users.root.openssh.authorizedKeys.keys = [ "ssh-ed25519 AAAA..." ];
};

Sockets and Devices

Files in /run/vmsilo/<name>/ (per-VM subdirectory owned by cfg.user):

Path Type Purpose
<name>/command.socket Socket Socket activation for vm-run commands
<name>/crosvm-control.socket Socket crosvm control socket (VM management)
<name>/console-backend.socket Socket Serial console backend (crosvm connects here)
<name>/console PTY User-facing serial console (for vm-shell)
<name>/wayland-0 Bind mount Wayland socket (via BindPaths, if GPU enabled)
<name>/pulse-native Bind mount PulseAudio socket (via BindPaths, if sound enabled)

The console relay service (vmsilo-<name>-console-relay.service) bridges crosvm to a PTY, allowing users to connect/disconnect without disrupting crosvm.

Service User Isolation: Each service runs under its own user:

Service User Method
vmsilo-<name>-vm vmsilo-<name> DynamicUser=yes
vmsilo-<name>-console-relay cfg.user Static (desktop user)
vmsilo-<name>@ (proxy) cfg.user Static (desktop user)
vm-switch-<netname> vm-switch-<netname> DynamicUser=yes

Groups for device/socket access: kvm (KVM), vfio (VFIO container), vmsilo-video (Wayland ACL), vmsilo-audio (PulseAudio ACL), vmsilo-net-<netname> (vhost-user sockets). The VM service's ExecStartPre=+ runs as root to set ACLs, chown VFIO devices, and set TAP interface ownership.

Desktop Integration: The module generates .desktop files for all applications in guestPrograms, allowing VM apps to appear in the host's desktop menu. Apps are organized into submenus named "VM: <name>" (e.g., "VM: banking" containing Firefox, Konsole). Each app launches via vm-run. Icons are copied from guest packages to ensure proper display.

Window Decoration Colors: Each VM's color option is passed to crosvm via --wayland-security-context app_id=vmsilo:<name>:<color>. A KWin patch (patches/) reads this security context and applies the color to window decorations (title bar, frame). Serverside decorations are forced so colors are always visible. Text color auto-contrasts based on luminance.

Host networking: VMs are offline by default. Set hostNetworking = true for internet access.

Disposable VMs: Set disposable = true to auto-shutdown after idle:

  • Guest runs vm-idle-watchdog service
  • Polls for active vsock-cmd@* instances every 5 seconds
  • Shuts down after idleTimeout seconds of inactivity

VM-to-VM Networking: VMs can communicate via vmNetwork option:

  • Each network has one router VM and multiple client VMs
  • Uses vhost-user-net backed by vm-switch daemon
  • MAC files written to /run/vm-switch/<network>/<vm>/<type>.mac
  • vm-switch monitors MAC files and creates sockets
  • crosvm connects via --vhost-user type=net,socket=...

rootfs-nixos Package

Builds a full NixOS system into a qcow2 image:

  • Outputs: { qcow2, kernel, initrd } for direct crosvm boot
  • Features: Systemd stage-1, overlayfs root (read-only ext4 + tmpfs upper), wayland-proxy-virtwl as systemd service
  • Self-contained: No host /nix sharing - packages configured at build time via guestPrograms
  • Socket activation: Uses vsock-cmd.socket + vsock-cmd@.service for command handling
  • Idle watchdog: Optional vm-idle-watchdog.service for disposable VMs
# Build with custom packages
vmsilo.lib.makeRootfsNixos "x86_64-linux" {
  guestPrograms = [ pkgs.firefox pkgs.konsole ];
  guestConfig = {
    # Additional NixOS configuration
    fileSystems."/home/user" = { device = "/dev/vdb"; fsType = "ext4"; };
  };
}

vm-switch Daemon

Provides L2 switching for VM-to-VM networks:

  • Location: vm-switch/ Rust crate
  • Build: nix build .#vm-switch
  • Purpose: Handles vhost-user protocol for VM network interfaces
  • Systemd: One service per vmNetwork (vm-switch-<netname>.service), runs as dynamic user vm-switch-<netname> with group vmsilo-net-<netname>

CLI flags:

-d, --config-dir <PATH>     Config/MAC file directory (default: /run/vm-switch)
--log-level <LEVEL>          error, warn, info, debug, trace (default: warn)
--no-sandbox                 Disable namespace sandboxing
--seccomp-mode <MODE>        kill (default), trap, log, disabled

Testing locally:

# Build and run manually
nix build .#vm-switch
./result/bin/vm-switch -d /tmp/test-switch --log-level debug

# In another terminal, create test MAC files
mkdir -p /tmp/test-switch/router
echo "52:00:00:00:00:01" > /tmp/test-switch/router/router.mac

Process model: Main process forks one child per VM. Children are vhost-user net backends that handle virtio TX/RX for their VM. Main orchestrates lifecycle, config watching, and buffer exchange between children. Children exit when the vhost-user client (crosvm) disconnects; main automatically restarts them so crosvm can reconnect.

Startup sequence:

  1. Parse args, apply namespace sandbox (single-threaded, before tokio)
  2. Apply main seccomp filter
  3. Start tokio runtime, create ConfigWatcher + BackendManager, enter async event loop (SIGCHLD via tokio select branch)

Key source files:

  • src/main.rs - Entry point, sandbox/seccomp setup, async event loop, SIGCHLD handling
  • src/manager.rs - BackendManager: fork children, buffer exchange, crash cleanup
  • src/args.rs - CLI argument parsing (clap)
  • src/config.rs - VM configuration types (VmRole, VmConfig)
  • src/watcher.rs - Config directory file watcher (inotify + debouncer)
  • src/mac.rs - MAC address type and parsing
  • src/control.rs - Main-child IPC over Unix seqpacket sockets + SCM_RIGHTS
  • src/ring.rs - Lock-free SPSC ring buffer in shared memory (memfd)
  • src/frame.rs - Ethernet frame parsing, MAC validation
  • src/sandbox.rs - Namespace isolation (user, PID, mount, IPC, network)
  • src/seccomp.rs - BPF syscall filters (main and child whitelists)
  • src/child/process.rs - Child entry point: control channel, vhost daemon, child seccomp
  • src/child/forwarder.rs - PacketForwarder: L2 routing via ring buffers
  • src/child/vhost.rs - ChildVhostBackend: virtio TX/RX callbacks
  • src/child/poll.rs - Event polling for control channel + ingress buffers

Control protocol (main <-> child IPC via SOCK_SEQPACKET + SCM_RIGHTS):

Direction Message FDs Purpose
Main -> Child GetBuffer { peer_name, peer_mac } - Ask child to create ingress buffer for a peer
Child -> Main BufferReady { peer_name } memfd, eventfd Ingress buffer created, here are the FDs
Main -> Child PutBuffer { peer_name, peer_mac, broadcast } memfd, eventfd Give child a peer's buffer as egress target
Main -> Child RemovePeer { peer_name } - Clean up buffers for disconnected/crashed peer
Main -> Child Ping - Heartbeat request (sent every 1s)
Child -> Main Ready - Child initialized and ready
Child -> Main Pong - Heartbeat response (must arrive within 100ms)

Messages serialized with postcard. FDs passed via ancillary data.

Buffer exchange flow:

  1. Main sends GetBuffer to Child1 ("create ingress buffer for Child2")
  2. Child1 creates SPSC ring buffer (memfd + eventfd), becomes Consumer, replies BufferReady
  3. Main forwards those FDs to Child2 via PutBuffer -- Child2 becomes Producer
  4. Packets now flow: Child2 writes to Producer -> shared memfd -> Child1 reads from Consumer

SPSC ring buffer (ring.rs): Lock-free single-producer/single-consumer queue backed by memfd_create() + mmap(MAP_SHARED). 64 slots, ~598KB total. Head/tail use atomic operations (no locks in datapath). Eventfd signals empty-to-non-empty transitions.

Sandbox (applied before tokio, requires single-threaded):

  1. User namespace - Maps real UID to 0 inside, enables unprivileged namespace creation
  2. PID namespace - Fork into new PID ns; main becomes PID 1
  3. Mount namespace - Minimal tmpfs root with /config (bind-mount of config dir), /dev (null, zero, urandom), /proc, /tmp. Pivot root, unmount old.
  4. IPC namespace - Isolates System V IPC
  5. Network namespace - Empty (no interfaces). Communication only via inherited FDs.

Seccomp filtering (BPF syscall whitelist):

  • --seccomp-mode=kill (default): Terminate on blocked syscall
  • --seccomp-mode=trap: Send SIGSYS (debug with strace)
  • --seccomp-mode=log: Log violations but allow
  • --seccomp-mode=disabled: Skip filtering

Two filter tiers (child is a strict subset of main):

  • Main: Allows fork, socket creation, inotify, openat (config watching + child management)
  • Child: No fork, no socket creation, no file open. Applied after vhost setup completes. Allows clone3 for vhost-user threads.

Dependencies

  • Custom crosvm fork: git.dsg.is/davidlowsec/crosvm.git
  • wayland-proxy-virtwl: Wayland/X11 proxying between host and guests
  • NixOS 25.11 base

NixOS Module Usage

Note: The configured user must have an explicit UID set (users.users.<name>.uid = <number>).

{ config, pkgs, ... }: {
  # User must have explicit UID for vmsilo
  users.users.david.uid = 1000;

  programs.vmsilo = {
    enable = true;
    user = "david";
    isolatedPciDevices = [ "01:00.0" ];  # PCI devices to isolate with vfio-pci

    # Global crosvm configuration
    crosvm = {
      logLevel = "info";     # error, warn, info, debug, trace
      extraArgs = [ ];       # Args before "run" subcommand
      extraRunArgs = [ ];    # Args after "run" subcommand
    };

    # VM-switch daemon configuration
    vm-switch = {
      logLevel = "info";     # error, warn, info, debug, trace
      extraArgs = [ ];       # Extra CLI arguments
    };

    nixosVms = [{
      id = 3;
      name = "banking";
      color = "#2ecc71";      # Window decoration color (default: "red")
      memory = 4096;
      hostNetworking = true;  # Enable internet access (default: false)
      disposable = true;   # Auto-shutdown when idle
      idleTimeout = 120;   # Shutdown after 2 minutes idle

      # Per-VM crosvm overrides (optional)
      crosvm = {
        logLevel = "debug";  # Override global log level for this VM
        extraArgs = [ ];     # Appended to global extraArgs
        extraRunArgs = [ ];  # Appended to global extraRunArgs
      };

      # Per-VM packages and NixOS config
      guestPrograms = [ pkgs.firefox pkgs.chromium ];
      guestConfig = {
        users.users.user.openssh.authorizedKeys.keys = [ "ssh-ed25519 AAAA..." ];
      };

      # Disk configuration (uses crosvm --block)
      # Free-form attrsets with path as positional arg, rest passed to crosvm
      additionalDisks = [{
        path = "/tmp/data.qcow2";
        ro = false;
        sparse = true;
        block-size = 4096;
        id = "data";
      }];

      # Custom boot (optional - defaults to built rootfs)
      # rootDisk = { path = "/path/to/root.qcow2"; ro = true; };
      # kernel = /path/to/bzImage;
      # initramfs = /path/to/initrd;
      rootDiskReadonly = true;  # default true

      # Extra kernel parameters
      kernelParams = [ "debug" ];

      # GPU config (crosvm --gpu=)
      # false=disabled, true=default, attrset=custom
      gpu = { context-types = "cross-domain:virgl2"; };

      # Sound config (crosvm --virtio-snd=)
      # false=disabled, true=default PulseAudio, attrset=custom
      sound = { backend = "pulse"; capture = true; };

      # PCI passthrough (devices from isolatedPciDevices)
      # Attrset with path (BDF or sysfs) and optional kv pairs
      pciDevices = [{ path = "01:00.0"; iommu = "on"; }];

      # Shared directories (crosvm --shared-dir)
      # Attrset with path, tag, and optional kv pairs (colon separator)
      sharedDirectories = [{ path = "/tmp/shared"; tag = "shared"; uid = 1000; }];

      # vhost-user devices (auto-populated from vmNetwork, can add custom)
      vhostUser = [{ type = "net"; socket = "/path/to/socket"; }];
    }];
  };
  # Access built package via config.programs.vmsilo.package
}