A NixOS-based Qubes-like app isolation system based on VMs
Find a file
Davíð Steinn Geirsson 6941d2fe4c feat(vm-switch): add process isolation with namespace sandbox and seccomp
Replace the thread-based vhost-user backend architecture with a
fork-based process model where each VM gets its own child process.
This enables strong isolation between VMs handling untrusted network
traffic, with multiple layers of defense in depth.

Process model:
- Main process watches config directory and orchestrates child lifecycle
- One child process forked per VM, running as vhost-user net backend
- Children communicate via SOCK_SEQPACKET control channel with SCM_RIGHTS
- Automatic child restart on crash/disconnect, with peer notification
- Ping/pong heartbeat monitoring for worker health (1s interval, 100ms timeout)
- SIGCHLD handling integrated into tokio event loop

Inter-process packet forwarding:
- Lock-free SPSC ring buffers in shared memory (memfd + mmap)
- 64-slot rings (~598KB each) with atomic head/tail, no locks in datapath
- Eventfd signaling for empty-to-non-empty transitions
- Main orchestrates buffer exchange: GetBuffer -> BufferReady -> PutBuffer
- Zero-copy path: producers write directly into consumer's shared memory

Namespace sandbox (applied before tokio, single-threaded):
- User namespace: unprivileged outside, UID 0 inside
- PID namespace: main is PID 1, children invisible to host
- Mount namespace: minimal tmpfs root with /config, /dev, /proc, /tmp
- IPC namespace: isolated System V IPC
- Network namespace: empty, communication only via inherited FDs
- Controllable via --no-sandbox flag

Seccomp BPF filtering (two-tier whitelist):
- Main filter: allows fork, socket creation, inotify, openat
- Child filter: strict subset - no fork, no socket, no file open
- Child filter applied after vhost setup, before event loop
- Modes: kill (default), trap (SIGSYS debug), log, disabled

Also adds vm-switch service dependencies to VM units in the NixOS
module so VMs wait for their network switch before starting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 20:19:26 +00:00
docs/plans Add vm-switch sandboxing design document 2026-02-08 02:05:01 +00:00
modules feat(vm-switch): add process isolation with namespace sandbox and seccomp 2026-02-09 20:19:26 +00:00
patches Patch KDE to force serverside decoraations and full window frame 2026-02-09 17:16:34 +00:00
rootfs-nixos Fix wayland-proxy-virtwl starting before user session 2026-02-08 04:31:42 +00:00
vm-switch feat(vm-switch): add process isolation with namespace sandbox and seccomp 2026-02-09 20:19:26 +00:00
.gitignore Refactor crosvm options to use attrset-based configuration 2026-02-07 22:54:03 +00:00
.gitmodules vm-switch: Switch back to upstream vhost 2026-02-07 21:45:26 +00:00
CLAUDE.md feat(vm-switch): add process isolation with namespace sandbox and seccomp 2026-02-09 20:19:26 +00:00
flake.lock Fix crosvm-gpu crash with wayland security context (for reals this time) 2026-02-08 23:03:07 +00:00
flake.nix vm-switch: Switch back to upstream vhost 2026-02-07 21:45:26 +00:00
README.md feat(vm-switch): add process isolation with namespace sandbox and seccomp 2026-02-09 20:19:26 +00:00

qubes-lite

A lightweight virtualization system inspired by Qubes OS. Runs isolated VMs using crosvm (Chrome OS VMM) with different security domains.

All credit goes to Thomas Leonard (@talex5), who wrote the wayland proxy and got all this stuff working: https://gitlab.com/talex5/qubes-lite

This is a hacky side project. If you need a serious and secure operating system, use Qubes.

Comparison to Qubes

The main benefits compared to Qubes are:

  • Fast, modern graphics. Wayland calls are proxied to the host.
  • Better power management. Qubes is based on Xen, and its support for modern laptop power management is significantly worse than linux.
  • NixOS-based declarative VM config.

The cost for that is security. Qubes is laser-focused on security and hard compartmentalisation. This makes it by far the most secure general-purpose operating system there is.

Ways in which we are less secure than Qubes (list is not even remotely exhaustive):

  • The host system is not isolated from the network at all. The user needs to use discipline to not access untrusted network resources from the host. Even if they do, handling VM network traffic makes the host attack surface much larger.
  • There is no attempt to isolate the host system from hardware peripherals. Qubes segregates USB and network into VMs.
  • Currently clipboard is shared between host and all VMs. This will be fixed at some point, the plan is to implement a two-level clipboard like Qubes.
  • Proxying wayland calls means the attack surface from VM to host is way larger than Qubes' raw framebuffer copy approach.
  • Probably a million other things.

If you are trying to defend against a determined, well-resourced attacker targeting you specifically then you should be running Qubes.

Quick Start

Add to your flake inputs:

{
  inputs.qubes-lite.url = "git+https://git.dsg.is/davidlowsec/qubes-lite.git";
}

Import the module and configure VMs in your NixOS configuration:

{ config, pkgs, ... }: {
  imports = [ inputs.qubes-lite.nixosModules.default ];

  # User must have explicit UID for qubes-lite
  users.users.david.uid = 1000;

  programs.qubes-lite = {
    enable = true;
    user = "david";
    hostNetworking = {
      network = "172.16.200.0/24";
      nat = {
        enable = true;
        interface = "eth0";
      };
    };

    nixosVms = [
      {
        id = 3;
        name = "banking";
        memory = 4096;
        cpus = 4;
        hostNetworking = true;  # Enable internet access
        disposable = true;   # Auto-shutdown when idle
        idleTimeout = 120;   # 2 minutes
        guestPrograms = with pkgs; [ firefox xfce4-terminal ];
      }
      {
        id = 5;
        name = "shopping";
        memory = 2048;
        cpus = 2;
        hostNetworking = true;
        disposable = true;
        guestPrograms = with pkgs; [ firefox xfce4-terminal ];
      }
      {
        id = 7;
        name = "personal";
        memory = 4096;
        cpus = 4;
        # hostNetworking defaults to false (offline VM)
        guestPrograms = with pkgs; [ libreoffice ];
      }
    ];
  };
}

Configuration Options

programs.qubes-lite

Option Type Default Description
enable bool false Enable qubes-lite VM management
user string required User who owns TAP interfaces and runs VMs (must have explicit UID)
hostNetworking.network string "172.16.200.0/24" Network CIDR for host networking
hostNetworking.nat.enable bool false Enable NAT for VM internet access
hostNetworking.nat.interface string "" External interface for NAT (required if nat.enable)
nixosVms list of VM configs [] List of NixOS-based VMs to create
enableBashIntegration bool true Enable bash completion for vm-* commands
crosvm.logLevel string "info" Log level for crosvm (error, warn, info, debug, trace)
crosvm.extraArgs list of strings [] Extra args passed to crosvm before "run" subcommand
crosvm.extraRunArgs list of strings [] Extra args passed to crosvm after "run" subcommand
vm-switch.logLevel string "info" Log level for vm-switch daemon (error, warn, info, debug, trace)
vm-switch.extraArgs list of strings [] Extra command line arguments for vm-switch daemon
isolatedPciDevices list of strings [] PCI devices to isolate with vfio-pci

VM Configuration (nixosVms items)

Option Type Default Description
id int required VM ID (odd number 3-255). Used for IP and vsock CID
name string required VM name for scripts and TAP interface
memory int 1024 Memory allocation in MB
cpus int 2 Number of virtual CPUs
hostNetworking bool false Enable host networking (tap interface) and NATed internet access
disposable bool false Auto-shutdown when idle (after idleTimeout seconds)
idleTimeout int 60 Seconds to wait before shutdown (when disposable=true)
additionalDisks list of disk configs [] Additional disks to attach (see Disk Configuration)
rootDisk disk config or null null Custom root disk (defaults to built rootfs)
kernel path or null null Custom kernel image
initramfs path or null null Custom initramfs
rootDiskReadonly bool true Whether root disk is read-only
kernelParams list of strings [] Extra kernel command line parameters
gpu bool or attrset true GPU config (false=disabled, true=default, attrset=custom)
sound bool or attrset false Sound config (false=disabled, true=default PulseAudio, attrset=custom)
sharedDirectories list of attrsets [] Shared directories with path, tag, and optional kv pairs
pciDevices list of attrsets [] PCI devices to passthrough (path + optional kv pairs)
guestPrograms list of packages [] VM-specific packages
guestConfig attrs {} VM-specific NixOS configuration
vmNetwork attrs of network config {} VM networks for VM-to-VM communication (see VM-to-VM Networking)
vhostUser list of attrsets [] vhost-user devices (auto-populated from vmNetwork)
crosvm.logLevel string or null null Per-VM log level override (uses global if null)
crosvm.extraArgs list of strings [] Per-VM extra args (appended to global crosvm.extraArgs)
crosvm.extraRunArgs list of strings [] Per-VM extra run args (appended to global crosvm.extraRunArgs)

Disk Configuration (additionalDisks items)

Free-form attrsets passed directly to crosvm --block. The path attribute is required and used as a positional argument.

additionalDisks = [{
  path = "/tmp/data.qcow2";  # required, positional
  ro = false;                # read-only
  sparse = true;             # enable discard/trim
  block-size = 4096;         # reported block size
  id = "data";               # device identifier
  direct = false;            # O_DIRECT mode
}];
# Results in: --block /tmp/data.qcow2,ro=false,sparse=true,block-size=4096,id=data,direct=false

GPU Configuration

gpu = false;  # Disabled
gpu = true;   # Default: context-types=cross-domain:virgl2
gpu = { context-types = "cross-domain:virgl2"; width = 1920; height = 1080; };  # Custom

Sound Configuration

sound = false;  # Disabled
sound = true;   # Default PulseAudio config
sound = { backend = "pulse"; capture = true; };  # Custom

Shared Directories

sharedDirectories = [{
  path = "/tmp/shared";  # Host path (required)
  tag = "shared";        # Tag visible to guest (required)
  type = "fs";           # Optional: fs type
  uid = 1000;            # Optional: UID mapping
}];
# Results in: --shared-dir /tmp/shared:shared:type=fs:uid=1000

PCI Passthrough Configuration

pciDevices = [{
  path = "01:00.0";  # BDF format, auto-converted to sysfs path
  iommu = "on";      # Optional: enable IOMMU
}];
# Results in: --vfio /sys/bus/pci/devices/0000:01:00.0/,iommu=on

vhost-user Devices

vhostUser = [{
  type = "net";
  socket = "/path/to/socket";
}];
# Results in: --vhost-user type=net,socket=/path/to/socket

Auto-populated from vmNetwork configuration with type = "net".

Commands

After rebuilding NixOS, the following commands are available:

vm-run <name> <command>

Example: vm-run banking firefox

This is the primary way to interact with VMs. The command:

  1. Connects to the VM's socket at /run/qubes-lite/<name>.sock
  2. Triggers socket activation to start the VM if not running
  3. Sends the command to the guest

Start/Stop VMs

vm-start <name>    # Start VM via systemd (uses polkit, no sudo needed)
vm-stop <name>     # Stop VM via systemd (uses polkit, no sudo needed)

Start VM for debugging

vm-start-debug <name>

Starts crosvm directly in the foreground (requires sudo), bypassing socket activation. Useful for debugging VM boot issues since crosvm output is visible.

Shell access

vm-shell <name>              # Connect to serial console (default)
vm-shell --ssh <name>        # SSH into VM as user
vm-shell --ssh --root <name> # SSH into VM as root

The default serial console mode requires no configuration. Press CTRL+] to escape.

SSH mode requires SSH keys configured in per-VM guestConfig (see Advanced Configuration).

Socket activation

VMs run as system services (for PCI passthrough and sandboxing) and start automatically on first access via systemd socket activation:

# Check socket status
systemctl status qubes-lite-banking.socket

# Check VM service status
systemctl status qubes-lite-banking-vm.service

Sockets are enabled by default and start on boot.

Network Architecture

IP Addressing

VMs use /31 point-to-point links:

  • VM IP: <network-base>.<id> (e.g., 172.16.200.3)
  • Host TAP IP: <network-base>.<id-1> (e.g., 172.16.200.2)

The host TAP IP acts as the gateway for the VM.

ID Requirements

VM IDs must be:

  • Odd numbers (3, 5, 7, 9, ...)
  • In range 3-255
  • Unique across all VMs

This ensures non-overlapping /31 networks and valid vsock CIDs.

NAT

When hostNetworking.nat.enable = true, the module configures:

  • IP forwarding (net.ipv4.ip_forward = 1)
  • NAT masquerading on hostNetworking.nat.interface
  • Internal IPs set to VM IPs

Advanced Configuration

SSH Access Configuration

To use vm-shell --ssh, configure SSH keys in per-VM guestConfig:

programs.qubes-lite = {
  nixosVms = [{
    id = 3;
    name = "dev";
    guestConfig = {
      users.users.user.openssh.authorizedKeys.keys = [ "ssh-ed25519 AAAA..." ];
      users.users.root.openssh.authorizedKeys.keys = [ "ssh-ed25519 AAAA..." ];  # for --ssh --root
    };
  }];
};

SSH uses vsock transport, so it works even for VMs without hostNetworking.

Shared VM Configuration

Define a common config variable and merge it with per-VM config:

{ config, pkgs, ... }:
let
  commonGuestConfig = {
    services.openssh.enable = true;
    users.users.user.extraGroups = [ "wheel" ];
    users.users.user.openssh.authorizedKeys.keys = [ "ssh-ed25519 AAAA..." ];
  };
  commonGuestPrograms = with pkgs; [ firefox xfce4-terminal ];
in {
  programs.qubes-lite = {
    nixosVms = [
      {
        id = 3;
        name = "dev";
        hostNetworking = true;
        guestPrograms = commonGuestPrograms;
        guestConfig = commonGuestConfig // {
          # Additional dev-specific config
          fileSystems."/home/user" = {
            device = "/dev/vdb";
            fsType = "ext4";
          };
        };
        additionalDisks = [{ path = "/dev/mapper/main-dev-home"; ro = false; }];
      }
      {
        id = 5;
        name = "banking";
        hostNetworking = true;
        guestPrograms = commonGuestPrograms;
        # Disable SSH for banking VM
        guestConfig = commonGuestConfig // {
          services.openssh.enable = false;
        };
      }
    ];
  };
}

Offline VMs

VMs are offline by default (hostNetworking = false). For sensitive data that should never touch the network:

{
  id = 13;
  name = "vault";
  memory = 2048;
  guestPrograms = with pkgs; [ keepassxc ];
}

VM-to-VM Networking

VMs can communicate with each other via vmNetwork. This creates isolated virtual networks using vhost-user-net backed by vm-switch:

nixosVms = [
  {
    id = 3;
    name = "router";
    hostNetworking = true;  # Router typically has internet access
    vmNetwork.internal = { router = true; };
  }
  {
    id = 5;
    name = "banking";
    vmNetwork.internal = {};  # Client on "internal" network
  }
  {
    id = 7;
    name = "shopping";
    vmNetwork.internal = {};  # Another client
  }
];

Each network requires exactly one router VM. Clients can only communicate with the router, not with each other directly. The router VM typically has hostNetworking = true to provide internet access to clients.

MAC addresses are auto-generated from the network and VM names. Override with explicit macAddress:

vmNetwork.internal = {
  router = true;
  macAddress = "52:ab:cd:ef:12:34";
};

Disposable VMs

VMs that auto-shutdown when idle to save memory:

{
  id = 9;
  name = "untrusted";
  memory = 4096;
  disposable = true;   # Enable auto-shutdown
  idleTimeout = 60;    # Shutdown 60 seconds after last command exits
}

The guest runs an idle watchdog that monitors for active command sessions. When no commands are running and idleTimeout seconds have passed since the last activity, the VM shuts down cleanly.

PCI Passthrough

Pass PCI devices (USB controllers, network cards) directly to VMs for hardware isolation.

Prerequisites

  1. Enable IOMMU in your bootloader:

    boot.kernelParams = [ "intel_iommu=on" ];  # or amd_iommu=on for AMD
    
  2. Identify devices to isolate:

    lspci -nn  # Note the BDF (e.g., 01:00.0)
    
  3. Check IOMMU groups (all devices in a group must go to the same VM):

    for d in /sys/kernel/iommu_groups/*/devices/*; do
      echo "Group $(basename $(dirname $(dirname $d))): $(basename $d)"
    done | sort -t: -k1 -n
    

Configuration

programs.qubes-lite = {
  # Devices to isolate from host (claimed by vfio-pci)
  isolatedPciDevices = [ "01:00.0" "02:00.0" ];

  nixosVms = [{
    id = 3;
    name = "sys-usb";
    memory = 1024;
    pciDevices = [{ path = "01:00.0"; }];  # USB controller
  }
  {
    id = 5;
    name = "sys-net";
    memory = 1024;
    pciDevices = [{ path = "02:00.0"; }];  # Network card
  }];
};

# Recommended: blacklist native drivers for reliability
boot.blacklistedKernelModules = [ "xhci_hcd" ];  # for USB controllers

How It Works

  1. Early boot: vfio-pci claims isolated devices before other drivers load
  2. Activation: If devices are already bound, they're rebound to vfio-pci
  3. VM start: IOMMU groups are validated, then devices are passed via --vfio

Building

# Build the default rootfs image
nix build .#

Architecture

Each NixOS VM gets:

  • A dedicated qcow2 rootfs image with packages baked in
  • Overlayfs root (read-only ext4 lower + tmpfs upper)
  • wayland-proxy-virtwl for GPU passthrough
  • Socket-activated command listener (vsock-cmd.socket + vsock-cmd@.service)
  • Optional idle watchdog for disposable VMs
  • Systemd-based init

The host provides:

  • Persistent TAP interfaces via NixOS networking
  • NAT for internet access (optional)
  • Socket activation for commands (/run/qubes-lite/<name>-command.socket)
  • Console PTY for serial access (/run/qubes-lite/<name>-console)
  • VM services run as root for PCI passthrough and sandboxing (crosvm drops privileges)
  • Polkit rules for the configured user to manage VM services without sudo
  • CLI tools: vm-run, vm-start, vm-stop, vm-start-debug, vm-shell
  • Desktop integration with .desktop files for guest applications

vm-switch

The vm-switch daemon (vm-switch/ Rust crate) provides L2 switching for VM-to-VM networks. One instance runs per vmNetwork, managed by systemd (vm-switch-<netname>.service).

Process model: The main process watches a config directory for MAC files and forks one child process per VM. Each child is a vhost-user net backend serving a single VM's network interface.

                        Main Process
                    (config watch, orchestration)
                     /          |          \
               fork /      fork |      fork \
                  v             v             v
            Child: router  Child: banking  Child: shopping
            (vhost-user)   (vhost-user)    (vhost-user)
                  |             |              |
            [unix socket]  [unix socket]  [unix socket]
                  |             |              |
              crosvm        crosvm         crosvm
            (router VM)   (banking VM)   (shopping VM)

Packet forwarding uses lock-free SPSC ring buffers in shared memory (memfd_create + mmap). When a VM transmits a frame, its child process validates the source MAC address and routes the frame to the correct destination:

  • Unicast: pushed into the destination child's ingress ring buffer
  • Broadcast/multicast: pushed into all peers' ingress buffers

Ring buffers use atomic head/tail pointers (no locks in the datapath) with eventfd signaling for empty-to-non-empty transitions.

Buffer exchange protocol: The main process orchestrates buffer setup between children via a control channel (SOCK_SEQPACKET + SCM_RIGHTS for passing memfd/eventfd file descriptors):

  1. Main tells Child A: "create an ingress buffer for Child B" (GetBuffer)
  2. Child A creates the ring buffer and returns the FDs (BufferReady)
  3. Main forwards those FDs to Child B as an egress target (PutBuffer)
  4. Child B can now write frames directly into Child A's memory -- no copies through the main process

Sandboxing: The daemon runs in a multi-layer sandbox applied at startup (before any async runtime or threads):

Layer Mechanism Effect
User namespace CLONE_NEWUSER Unprivileged outside, appears as UID 0 inside
PID namespace CLONE_NEWPID Main is PID 1; children invisible to host
Mount namespace CLONE_NEWNS + pivot_root Minimal tmpfs root: /config, /dev (null/zero/urandom), /proc, /tmp
IPC namespace CLONE_NEWIPC Isolated System V IPC
Network namespace CLONE_NEWNET No interfaces; communication only via inherited FDs
Seccomp (main) BPF whitelist Allows fork, socket creation, inotify for config watching
Seccomp (child) Tighter BPF whitelist No fork, no socket creation, no file open; applied after vhost setup

Seccomp modes: --seccomp-mode=kill (default), trap (SIGSYS for debugging), log, disabled.

Disable sandboxing for debugging with --no-sandbox and --seccomp-mode=disabled.