Replace the thread-based vhost-user backend architecture with a fork-based process model where each VM gets its own child process. This enables strong isolation between VMs handling untrusted network traffic, with multiple layers of defense in depth. Process model: - Main process watches config directory and orchestrates child lifecycle - One child process forked per VM, running as vhost-user net backend - Children communicate via SOCK_SEQPACKET control channel with SCM_RIGHTS - Automatic child restart on crash/disconnect, with peer notification - Ping/pong heartbeat monitoring for worker health (1s interval, 100ms timeout) - SIGCHLD handling integrated into tokio event loop Inter-process packet forwarding: - Lock-free SPSC ring buffers in shared memory (memfd + mmap) - 64-slot rings (~598KB each) with atomic head/tail, no locks in datapath - Eventfd signaling for empty-to-non-empty transitions - Main orchestrates buffer exchange: GetBuffer -> BufferReady -> PutBuffer - Zero-copy path: producers write directly into consumer's shared memory Namespace sandbox (applied before tokio, single-threaded): - User namespace: unprivileged outside, UID 0 inside - PID namespace: main is PID 1, children invisible to host - Mount namespace: minimal tmpfs root with /config, /dev, /proc, /tmp - IPC namespace: isolated System V IPC - Network namespace: empty, communication only via inherited FDs - Controllable via --no-sandbox flag Seccomp BPF filtering (two-tier whitelist): - Main filter: allows fork, socket creation, inotify, openat - Child filter: strict subset - no fork, no socket, no file open - Child filter applied after vhost setup, before event loop - Modes: kill (default), trap (SIGSYS debug), log, disabled Also adds vm-switch service dependencies to VM units in the NixOS module so VMs wait for their network switch before starting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| docs/plans | ||
| modules | ||
| patches | ||
| rootfs-nixos | ||
| vm-switch | ||
| .gitignore | ||
| .gitmodules | ||
| CLAUDE.md | ||
| flake.lock | ||
| flake.nix | ||
| README.md | ||
qubes-lite
A lightweight virtualization system inspired by Qubes OS. Runs isolated VMs using crosvm (Chrome OS VMM) with different security domains.
All credit goes to Thomas Leonard (@talex5), who wrote the wayland proxy and got all this stuff working: https://gitlab.com/talex5/qubes-lite
This is a hacky side project. If you need a serious and secure operating system, use Qubes.
Comparison to Qubes
The main benefits compared to Qubes are:
- Fast, modern graphics. Wayland calls are proxied to the host.
- Better power management. Qubes is based on Xen, and its support for modern laptop power management is significantly worse than linux.
- NixOS-based declarative VM config.
The cost for that is security. Qubes is laser-focused on security and hard compartmentalisation. This makes it by far the most secure general-purpose operating system there is.
Ways in which we are less secure than Qubes (list is not even remotely exhaustive):
- The host system is not isolated from the network at all. The user needs to use discipline to not access untrusted network resources from the host. Even if they do, handling VM network traffic makes the host attack surface much larger.
- There is no attempt to isolate the host system from hardware peripherals. Qubes segregates USB and network into VMs.
- Currently clipboard is shared between host and all VMs. This will be fixed at some point, the plan is to implement a two-level clipboard like Qubes.
- Proxying wayland calls means the attack surface from VM to host is way larger than Qubes' raw framebuffer copy approach.
- Probably a million other things.
If you are trying to defend against a determined, well-resourced attacker targeting you specifically then you should be running Qubes.
Quick Start
Add to your flake inputs:
{
inputs.qubes-lite.url = "git+https://git.dsg.is/davidlowsec/qubes-lite.git";
}
Import the module and configure VMs in your NixOS configuration:
{ config, pkgs, ... }: {
imports = [ inputs.qubes-lite.nixosModules.default ];
# User must have explicit UID for qubes-lite
users.users.david.uid = 1000;
programs.qubes-lite = {
enable = true;
user = "david";
hostNetworking = {
network = "172.16.200.0/24";
nat = {
enable = true;
interface = "eth0";
};
};
nixosVms = [
{
id = 3;
name = "banking";
memory = 4096;
cpus = 4;
hostNetworking = true; # Enable internet access
disposable = true; # Auto-shutdown when idle
idleTimeout = 120; # 2 minutes
guestPrograms = with pkgs; [ firefox xfce4-terminal ];
}
{
id = 5;
name = "shopping";
memory = 2048;
cpus = 2;
hostNetworking = true;
disposable = true;
guestPrograms = with pkgs; [ firefox xfce4-terminal ];
}
{
id = 7;
name = "personal";
memory = 4096;
cpus = 4;
# hostNetworking defaults to false (offline VM)
guestPrograms = with pkgs; [ libreoffice ];
}
];
};
}
Configuration Options
programs.qubes-lite
| Option | Type | Default | Description |
|---|---|---|---|
enable |
bool | false |
Enable qubes-lite VM management |
user |
string | required | User who owns TAP interfaces and runs VMs (must have explicit UID) |
hostNetworking.network |
string | "172.16.200.0/24" |
Network CIDR for host networking |
hostNetworking.nat.enable |
bool | false |
Enable NAT for VM internet access |
hostNetworking.nat.interface |
string | "" |
External interface for NAT (required if nat.enable) |
nixosVms |
list of VM configs | [] |
List of NixOS-based VMs to create |
enableBashIntegration |
bool | true |
Enable bash completion for vm-* commands |
crosvm.logLevel |
string | "info" |
Log level for crosvm (error, warn, info, debug, trace) |
crosvm.extraArgs |
list of strings | [] |
Extra args passed to crosvm before "run" subcommand |
crosvm.extraRunArgs |
list of strings | [] |
Extra args passed to crosvm after "run" subcommand |
vm-switch.logLevel |
string | "info" |
Log level for vm-switch daemon (error, warn, info, debug, trace) |
vm-switch.extraArgs |
list of strings | [] |
Extra command line arguments for vm-switch daemon |
isolatedPciDevices |
list of strings | [] |
PCI devices to isolate with vfio-pci |
VM Configuration (nixosVms items)
| Option | Type | Default | Description |
|---|---|---|---|
id |
int | required | VM ID (odd number 3-255). Used for IP and vsock CID |
name |
string | required | VM name for scripts and TAP interface |
memory |
int | 1024 |
Memory allocation in MB |
cpus |
int | 2 |
Number of virtual CPUs |
hostNetworking |
bool | false |
Enable host networking (tap interface) and NATed internet access |
disposable |
bool | false |
Auto-shutdown when idle (after idleTimeout seconds) |
idleTimeout |
int | 60 |
Seconds to wait before shutdown (when disposable=true) |
additionalDisks |
list of disk configs | [] |
Additional disks to attach (see Disk Configuration) |
rootDisk |
disk config or null | null | Custom root disk (defaults to built rootfs) |
kernel |
path or null | null | Custom kernel image |
initramfs |
path or null | null | Custom initramfs |
rootDiskReadonly |
bool | true |
Whether root disk is read-only |
kernelParams |
list of strings | [] |
Extra kernel command line parameters |
gpu |
bool or attrset | true |
GPU config (false=disabled, true=default, attrset=custom) |
sound |
bool or attrset | false |
Sound config (false=disabled, true=default PulseAudio, attrset=custom) |
sharedDirectories |
list of attrsets | [] |
Shared directories with path, tag, and optional kv pairs |
pciDevices |
list of attrsets | [] |
PCI devices to passthrough (path + optional kv pairs) |
guestPrograms |
list of packages | [] |
VM-specific packages |
guestConfig |
attrs | {} |
VM-specific NixOS configuration |
vmNetwork |
attrs of network config | {} |
VM networks for VM-to-VM communication (see VM-to-VM Networking) |
vhostUser |
list of attrsets | [] |
vhost-user devices (auto-populated from vmNetwork) |
crosvm.logLevel |
string or null | null | Per-VM log level override (uses global if null) |
crosvm.extraArgs |
list of strings | [] |
Per-VM extra args (appended to global crosvm.extraArgs) |
crosvm.extraRunArgs |
list of strings | [] |
Per-VM extra run args (appended to global crosvm.extraRunArgs) |
Disk Configuration (additionalDisks items)
Free-form attrsets passed directly to crosvm --block. The path attribute is required and used as a positional argument.
additionalDisks = [{
path = "/tmp/data.qcow2"; # required, positional
ro = false; # read-only
sparse = true; # enable discard/trim
block-size = 4096; # reported block size
id = "data"; # device identifier
direct = false; # O_DIRECT mode
}];
# Results in: --block /tmp/data.qcow2,ro=false,sparse=true,block-size=4096,id=data,direct=false
GPU Configuration
gpu = false; # Disabled
gpu = true; # Default: context-types=cross-domain:virgl2
gpu = { context-types = "cross-domain:virgl2"; width = 1920; height = 1080; }; # Custom
Sound Configuration
sound = false; # Disabled
sound = true; # Default PulseAudio config
sound = { backend = "pulse"; capture = true; }; # Custom
Shared Directories
sharedDirectories = [{
path = "/tmp/shared"; # Host path (required)
tag = "shared"; # Tag visible to guest (required)
type = "fs"; # Optional: fs type
uid = 1000; # Optional: UID mapping
}];
# Results in: --shared-dir /tmp/shared:shared:type=fs:uid=1000
PCI Passthrough Configuration
pciDevices = [{
path = "01:00.0"; # BDF format, auto-converted to sysfs path
iommu = "on"; # Optional: enable IOMMU
}];
# Results in: --vfio /sys/bus/pci/devices/0000:01:00.0/,iommu=on
vhost-user Devices
vhostUser = [{
type = "net";
socket = "/path/to/socket";
}];
# Results in: --vhost-user type=net,socket=/path/to/socket
Auto-populated from vmNetwork configuration with type = "net".
Commands
After rebuilding NixOS, the following commands are available:
Run command in VM (recommended)
vm-run <name> <command>
Example: vm-run banking firefox
This is the primary way to interact with VMs. The command:
- Connects to the VM's socket at
/run/qubes-lite/<name>.sock - Triggers socket activation to start the VM if not running
- Sends the command to the guest
Start/Stop VMs
vm-start <name> # Start VM via systemd (uses polkit, no sudo needed)
vm-stop <name> # Stop VM via systemd (uses polkit, no sudo needed)
Start VM for debugging
vm-start-debug <name>
Starts crosvm directly in the foreground (requires sudo), bypassing socket activation. Useful for debugging VM boot issues since crosvm output is visible.
Shell access
vm-shell <name> # Connect to serial console (default)
vm-shell --ssh <name> # SSH into VM as user
vm-shell --ssh --root <name> # SSH into VM as root
The default serial console mode requires no configuration. Press CTRL+] to escape.
SSH mode requires SSH keys configured in per-VM guestConfig (see Advanced Configuration).
Socket activation
VMs run as system services (for PCI passthrough and sandboxing) and start automatically on first access via systemd socket activation:
# Check socket status
systemctl status qubes-lite-banking.socket
# Check VM service status
systemctl status qubes-lite-banking-vm.service
Sockets are enabled by default and start on boot.
Network Architecture
IP Addressing
VMs use /31 point-to-point links:
- VM IP:
<network-base>.<id>(e.g.,172.16.200.3) - Host TAP IP:
<network-base>.<id-1>(e.g.,172.16.200.2)
The host TAP IP acts as the gateway for the VM.
ID Requirements
VM IDs must be:
- Odd numbers (3, 5, 7, 9, ...)
- In range 3-255
- Unique across all VMs
This ensures non-overlapping /31 networks and valid vsock CIDs.
NAT
When hostNetworking.nat.enable = true, the module configures:
- IP forwarding (
net.ipv4.ip_forward = 1) - NAT masquerading on
hostNetworking.nat.interface - Internal IPs set to VM IPs
Advanced Configuration
SSH Access Configuration
To use vm-shell --ssh, configure SSH keys in per-VM guestConfig:
programs.qubes-lite = {
nixosVms = [{
id = 3;
name = "dev";
guestConfig = {
users.users.user.openssh.authorizedKeys.keys = [ "ssh-ed25519 AAAA..." ];
users.users.root.openssh.authorizedKeys.keys = [ "ssh-ed25519 AAAA..." ]; # for --ssh --root
};
}];
};
SSH uses vsock transport, so it works even for VMs without hostNetworking.
Shared VM Configuration
Define a common config variable and merge it with per-VM config:
{ config, pkgs, ... }:
let
commonGuestConfig = {
services.openssh.enable = true;
users.users.user.extraGroups = [ "wheel" ];
users.users.user.openssh.authorizedKeys.keys = [ "ssh-ed25519 AAAA..." ];
};
commonGuestPrograms = with pkgs; [ firefox xfce4-terminal ];
in {
programs.qubes-lite = {
nixosVms = [
{
id = 3;
name = "dev";
hostNetworking = true;
guestPrograms = commonGuestPrograms;
guestConfig = commonGuestConfig // {
# Additional dev-specific config
fileSystems."/home/user" = {
device = "/dev/vdb";
fsType = "ext4";
};
};
additionalDisks = [{ path = "/dev/mapper/main-dev-home"; ro = false; }];
}
{
id = 5;
name = "banking";
hostNetworking = true;
guestPrograms = commonGuestPrograms;
# Disable SSH for banking VM
guestConfig = commonGuestConfig // {
services.openssh.enable = false;
};
}
];
};
}
Offline VMs
VMs are offline by default (hostNetworking = false). For sensitive data that should never touch the network:
{
id = 13;
name = "vault";
memory = 2048;
guestPrograms = with pkgs; [ keepassxc ];
}
VM-to-VM Networking
VMs can communicate with each other via vmNetwork. This creates isolated virtual networks using vhost-user-net backed by vm-switch:
nixosVms = [
{
id = 3;
name = "router";
hostNetworking = true; # Router typically has internet access
vmNetwork.internal = { router = true; };
}
{
id = 5;
name = "banking";
vmNetwork.internal = {}; # Client on "internal" network
}
{
id = 7;
name = "shopping";
vmNetwork.internal = {}; # Another client
}
];
Each network requires exactly one router VM. Clients can only communicate with the router, not with each other directly. The router VM typically has hostNetworking = true to provide internet access to clients.
MAC addresses are auto-generated from the network and VM names. Override with explicit macAddress:
vmNetwork.internal = {
router = true;
macAddress = "52:ab:cd:ef:12:34";
};
Disposable VMs
VMs that auto-shutdown when idle to save memory:
{
id = 9;
name = "untrusted";
memory = 4096;
disposable = true; # Enable auto-shutdown
idleTimeout = 60; # Shutdown 60 seconds after last command exits
}
The guest runs an idle watchdog that monitors for active command sessions. When no commands are running and idleTimeout seconds have passed since the last activity, the VM shuts down cleanly.
PCI Passthrough
Pass PCI devices (USB controllers, network cards) directly to VMs for hardware isolation.
Prerequisites
-
Enable IOMMU in your bootloader:
boot.kernelParams = [ "intel_iommu=on" ]; # or amd_iommu=on for AMD -
Identify devices to isolate:
lspci -nn # Note the BDF (e.g., 01:00.0) -
Check IOMMU groups (all devices in a group must go to the same VM):
for d in /sys/kernel/iommu_groups/*/devices/*; do echo "Group $(basename $(dirname $(dirname $d))): $(basename $d)" done | sort -t: -k1 -n
Configuration
programs.qubes-lite = {
# Devices to isolate from host (claimed by vfio-pci)
isolatedPciDevices = [ "01:00.0" "02:00.0" ];
nixosVms = [{
id = 3;
name = "sys-usb";
memory = 1024;
pciDevices = [{ path = "01:00.0"; }]; # USB controller
}
{
id = 5;
name = "sys-net";
memory = 1024;
pciDevices = [{ path = "02:00.0"; }]; # Network card
}];
};
# Recommended: blacklist native drivers for reliability
boot.blacklistedKernelModules = [ "xhci_hcd" ]; # for USB controllers
How It Works
- Early boot: vfio-pci claims isolated devices before other drivers load
- Activation: If devices are already bound, they're rebound to vfio-pci
- VM start: IOMMU groups are validated, then devices are passed via
--vfio
Building
# Build the default rootfs image
nix build .#
Architecture
Each NixOS VM gets:
- A dedicated qcow2 rootfs image with packages baked in
- Overlayfs root (read-only ext4 lower + tmpfs upper)
- wayland-proxy-virtwl for GPU passthrough
- Socket-activated command listener (
vsock-cmd.socket+vsock-cmd@.service) - Optional idle watchdog for disposable VMs
- Systemd-based init
The host provides:
- Persistent TAP interfaces via NixOS networking
- NAT for internet access (optional)
- Socket activation for commands (
/run/qubes-lite/<name>-command.socket) - Console PTY for serial access (
/run/qubes-lite/<name>-console) - VM services run as root for PCI passthrough and sandboxing (crosvm drops privileges)
- Polkit rules for the configured user to manage VM services without sudo
- CLI tools:
vm-run,vm-start,vm-stop,vm-start-debug,vm-shell - Desktop integration with .desktop files for guest applications
vm-switch
The vm-switch daemon (vm-switch/ Rust crate) provides L2 switching for VM-to-VM networks. One instance runs per vmNetwork, managed by systemd (vm-switch-<netname>.service).
Process model: The main process watches a config directory for MAC files and forks one child process per VM. Each child is a vhost-user net backend serving a single VM's network interface.
Main Process
(config watch, orchestration)
/ | \
fork / fork | fork \
v v v
Child: router Child: banking Child: shopping
(vhost-user) (vhost-user) (vhost-user)
| | |
[unix socket] [unix socket] [unix socket]
| | |
crosvm crosvm crosvm
(router VM) (banking VM) (shopping VM)
Packet forwarding uses lock-free SPSC ring buffers in shared memory (memfd_create + mmap). When a VM transmits a frame, its child process validates the source MAC address and routes the frame to the correct destination:
- Unicast: pushed into the destination child's ingress ring buffer
- Broadcast/multicast: pushed into all peers' ingress buffers
Ring buffers use atomic head/tail pointers (no locks in the datapath) with eventfd signaling for empty-to-non-empty transitions.
Buffer exchange protocol: The main process orchestrates buffer setup between children via a control channel (SOCK_SEQPACKET + SCM_RIGHTS for passing memfd/eventfd file descriptors):
- Main tells Child A: "create an ingress buffer for Child B" (
GetBuffer) - Child A creates the ring buffer and returns the FDs (
BufferReady) - Main forwards those FDs to Child B as an egress target (
PutBuffer) - Child B can now write frames directly into Child A's memory -- no copies through the main process
Sandboxing: The daemon runs in a multi-layer sandbox applied at startup (before any async runtime or threads):
| Layer | Mechanism | Effect |
|---|---|---|
| User namespace | CLONE_NEWUSER |
Unprivileged outside, appears as UID 0 inside |
| PID namespace | CLONE_NEWPID |
Main is PID 1; children invisible to host |
| Mount namespace | CLONE_NEWNS + pivot_root |
Minimal tmpfs root: /config, /dev (null/zero/urandom), /proc, /tmp |
| IPC namespace | CLONE_NEWIPC |
Isolated System V IPC |
| Network namespace | CLONE_NEWNET |
No interfaces; communication only via inherited FDs |
| Seccomp (main) | BPF whitelist | Allows fork, socket creation, inotify for config watching |
| Seccomp (child) | Tighter BPF whitelist | No fork, no socket creation, no file open; applied after vhost setup |
Seccomp modes: --seccomp-mode=kill (default), trap (SIGSYS for debugging), log, disabled.
Disable sandboxing for debugging with --no-sandbox and --seccomp-mode=disabled.