Commit graph

91 commits

Author SHA1 Message Date
faad5006c9 cleanup: remove vmsilo-start-* scripts, rename vmsilo-usb to vm-usb, fix vm-run output
- Remove vmsilo-start-* user-facing symlinks from package.nix (internal
  VM launcher scripts are only used by systemd ExecStart, not by users)
- Rename vmsilo-usb to vm-usb to match the vm-* naming convention
- Increase socat -t timeout in vm-run from default 0.5s to 5s to fix
  missing output from console commands (cloud-hypervisor proxy startup
  latency exceeded the default timeout window)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 16:51:57 +00:00
08709827fb feat: replace crosvm USB passthrough with usbip-over-vsock
Replace crosvm xhci-based USB passthrough with usbip-rs over vsock,
enabling USB passthrough for both crosvm and cloud-hypervisor VMs.

Guest runs a persistent usbip-rs client listener on vsock port 5002.
Host runs one sandboxed usbip-rs host connect process per attached
device as a systemd template service (vmsilo-<vm>-usb@<devpath>).

Eliminates the JSON state file, file locking, and crosvm-specific
shell helper library in favor of systemd as the source of truth.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 16:15:27 +00:00
8b6a5594c5 docs: update CLAUDE.md for module restructuring
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 14:44:58 +00:00
bf23b518b8 feat: use cloud-hypervisor custom flake input instead of nixpkgs
Now that git.dsg.is/dsg/cloud-hypervisor.git has a flake.nix,
use it as a proper flake input with inputs.nixpkgs.follows.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:36:45 +00:00
298ad6bcee docs: update CLAUDE.md for cloud-hypervisor architecture
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 09:36:45 +00:00
62ceb91b90 Update docs for vmsilo-dbus-proxy rename and notification support
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 17:21:39 +00:00
1856d4b72e Document automatic DNS in architecture notes 2026-03-19 14:31:39 +00:00
43c99ec162 Document USB device passthrough in README and CLAUDE.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 13:10:54 +00:00
0a07f7f14e Switch ephemeral overlay from qcow2 to raw sparse image
qcow2 causes O_DIRECT failures on ext4 due to crosvm doing unaligned
access when parsing the qcow2 header. Since we don't use any qcow2
features (the disk is created fresh and deleted on stop), a raw sparse
file via truncate works just as well and also removes the qemu package
dependency from the VM service.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 00:49:44 +00:00
3b640b1662 netvm: support network.netvm = "host" for host-routed networking
Route VM traffic through the host directly instead of requiring a
separate netvm VM. Uses the same nftables NAT and forward firewall
rules as VM-based netvms, applied on the host using TAP interface
names. Removes the hostNetworking.nat options in favor of the
unified netvm approach.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 15:28:47 +00:00
b247b46066 Start autoStart GPU VMs with session instead of at boot
GPU VMs need a Wayland socket, so starting them at multi-user.target
(boot) fails. The session-bind user service now also starts autoStart
GPU VMs when the graphical session begins. Non-GPU VMs still start at
boot.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 22:14:53 +00:00
f1041db662 Refactor GPU option from raw crosvm attrset to high-level feature config
Replace the low-level gpu attrset (mapped directly to --gpu args) with a
submodule of supported features: wayland (cross-domain), opengl (virgl2),
and vulkan (venus). Vulkan automatically adds --gpu-render-server.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 13:06:41 +00:00
d357d47050 Split guest rootfs configuration into focused submodules
Breaks the monolithic rootfs-nixos/configuration.nix (582 lines) into
7 files under rootfs-nixos/guest/ for better readability and separation
of concerns: boot, users, networking, wayland, command, system, plus a
shared kernel-param-helper.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 19:40:51 +00:00
8de5e55801 Replace sound tri-state with playback/capture booleans
Instead of passing raw crosvm attrsets, sound is now configured
with two booleans: sound.playback (default true) and sound.capture
(default false, implies playback).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 00:30:39 +00:00
714500e5f0 CLAUDE.md: rewrite for conciseness and accuracy (327 → 114 lines)
Fix stale modules/config.nix reference (module is split across many
files), remove stale docs/ reference, remove duplicated implementation
details (balloond CLI flags, socket tables, full NixOS config example,
etc.), and add Key Patterns and Gotchas sections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 23:43:17 +00:00
2880cbcdc4 balloond: tune defaults: critical floor 256m, bias 400m
Now that critical_guest_available is a hard floor, lower it from 400m
to 256m (guests can safely operate with 256 MiB free). Increase
guest_available_bias from 300m to 400m for stronger graduated
resistance as the balloon fills, keeping the comfortable equilibrium
point around 656 MiB.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 21:31:13 +00:00
4791e391f0 balloond: enforce critical_guest_available as hard floor and add CLI arg
The critical_guest_available threshold was not actually a floor —
when both host and guest were below their critical thresholds, the
clamp bounds inverted and were skipped entirely, letting the
equalization formula push guests to dangerously low memory (34 MiB
observed with a 400 MiB "critical" threshold).

Add a hard floor: after computing the balloon delta, clamp positive
deltas so inflation never pushes guest free memory below
critical_guest_available, regardless of host pressure.

Also add --critical-guest-available CLI arg (default 400m) and the
corresponding NixOS option (criticalGuestAvailable) so both knobs
are tunable. To target e.g. 300 MiB as a floor:

  --critical-guest-available 250m --guest-available-bias 50m

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 21:29:03 +00:00
e52d79706a docs: document PSI-adaptive polling options
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 19:21:10 +00:00
e6e14e0d33 Add per-VM copyChannel option to include NixOS channel in rootfs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 18:47:05 +00:00
be71982c7d Replace ext4+qcow2 rootfs with erofs
The rootfs is read-only, used only as an overlayfs lower layer — ext4
with journaling was a poor fit. erofs is purpose-built for this: compressed
(lz4hc), compact metadata, faster random reads.

The new builder (make-erofs-image.nix) runs nixos-install and mkfs.erofs
under a single fakeroot session, eliminating the QEMU VM previously needed
during the build.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 18:21:33 +00:00
9bf6f510bd Stop GPU-enabled VMs on desktop session end
GPU VMs connect to the host Wayland socket, which is destroyed on logout.
Add per-VM systemd user services that bind to graphical-session.target and
stop the VM system service when the session deactivates. Also make
--wayland-security-context conditional on GPU being enabled.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 17:55:07 +00:00
61ca26690a Remove vm-switch and bufferbloat-test
The vm-switch experiment for VM-to-VM networking via vhost-user-net
didn't work out — it performs poorly under load, with busy connections
saturating the buffer and causing high latency for others.

Removes the vm-switch Rust crate, bufferbloat-test suite, all NixOS
module integration (options, services, networking, assertions, scripts),
and documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 23:41:13 +00:00
a4b967862f feat(tray): add tray proxy for VM system tray integration
Proxy guest StatusNotifierItems to the host system tray over vsock:5001.
Guest-side watches SNI registrations on D-Bus, collects snapshots
(properties + DBusMenu tree), and streams them to the host. Host-side
creates synthetic SNI items with sanitized data and forwards user
interactions (clicks, scrolls, menu events) back to the guest. Includes
icon theme forwarding, VM color tinting on icon borders, CSS named color
resolution, and automatic service startup with the VM.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 23:29:42 +00:00
db711e9cff feat: add vmsilo-balloond dynamic balloon memory daemon
Implements a host-side daemon that equalizes free memory headroom between
host and guests via virtio-balloon, using ChromeOS's BalanceAvailablePolicy.
Includes VM discovery via inotify, crosvm control socket client, stall
detection, NixOS module integration, and CLI argument parsing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 18:09:30 +00:00
7adf6ab1d3 Update CLAUDE.md for full login environment sourcing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 23:54:34 +00:00
2b5bc1bc7c Update docs to reflect vsock-cmd user service conversion
CLAUDE.md and README.md now accurately describe:
- vmsilo-session-setup and vmsilo-session.target for graphical session
- vsock-cmd as user services gated on graphical-session.target
- vm-idle-watchdog querying user service instances

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 23:44:21 +00:00
84203ff1da feat: accept list of NixOS modules in guestConfig
guestConfig now accepts a single module or a list of modules via
coercedTo. Single modules are auto-wrapped in a list for backwards
compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 20:37:39 +00:00
fb3406fa9a docs: update guestConfig documentation for NixOS module type
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 20:25:44 +00:00
c80a8c807a Auto-assign sequential VM IDs instead of requiring manual specification
VM IDs (used for vsock CIDs and TAP name truncation) are now
automatically assigned starting from 3 based on list position,
removing the need for users to specify them. Also removes the
constraints that IDs must be odd numbers in range 3-255.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 20:10:06 +00:00
0d36da9f55 Move project under dsg gitlab namespace 2026-02-15 13:15:50 +00:00
ba6d01488e feat(modules): initialize shared home from template directory
New VM home directories are created by copying /var/lib/vmsilo/home-template
on first start, allowing users to seed dotfiles and configs. The template
directory is created via tmpfiles and owned by the configured user.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 12:23:46 +00:00
ba0f77acf7 feat(modules): add per-VM sharedHome option for virtiofs home directory
Shares a host directory as /home/user in guest VMs via virtiofs, enabled
by default. Accepts true (/shared/<vmname>), a custom path string, or
false to disable. Host directory is created with correct uid:gid ownership
at VM start.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 12:01:12 +00:00
cbfab2fc6d feat(modules): add per-VM waylandProxy option (wayland-proxy-virtwl or sommelier)
Allows each VM to choose its Wayland proxy. Defaults to wayland-proxy-virtwl
(existing behavior). Setting waylandProxy = "sommelier" uses the ChromeOS
sommelier compositor instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 17:09:56 +00:00
9055ffe954 refactor(modules): change TAP naming from vm-<name><idx> to <name>-<idx>
Drop the vm- prefix, add dash separator between VM name and interface
index, and remove the 10-character VM name limit. Long names that would
exceed IFNAMSIZ (15 chars) are truncated with VM ID appended for
uniqueness (e.g., bankingsupe3-22 for id=3).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 16:32:37 +00:00
8f0eefe51c feat(modules): add tap.bridge option for bridged TAP interfaces
Allow TAP interfaces to be added to a named host bridge
(networking.bridges) instead of being assigned a host IP.
Only one of tap.hostAddress or tap.bridge may be set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 16:07:57 +00:00
e9288236fe refactor(vm-switch): separate TX/RX metrics for producer/consumer clarity
The old metrics structure conflated producer-side and consumer-side data
into a single "egress" map, making fields like codel_drops (always 0 on
producer side) confusing and hiding consumer-side sojourn, flows, and
queue depth entirely. Replace PeerMetrics with TxPeerMetrics (per-dest
DRR stats), RxSourceMetrics (per-source drop attribution + ring transit),
and RxMetrics (consumer FQ-CoDel aggregate with sojourn, flows, queue
depth). Rename egress→tx, add rx section, rename fq_sojourn→drr_sojourn
and sojourn→ring_transit for self-documenting JSON output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 15:59:21 +00:00
642ccbf050 feat(vm-switch): add consumer-side metrics logging and update docs (Phase 4)
Add debug-level logging of consumer FQ-CoDel queue state (depth, active
flows) in collect_metrics(). Update CLAUDE.md to document the two-stage
FQ-CoDel architecture (producer DRR-only, consumer FQ-CoDel with real
sojourn measurement and per-source-peer drop attribution).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 15:59:21 +00:00
89ea656917 Update CLAUDE.md 2026-02-14 12:59:48 +00:00
32d6a4a98f feat(rootfs): move overlay upper layer from tmpfs to ephemeral qcow2
VM root overlay writes now go to a sparse qcow2 disk instead of tmpfs,
reducing host RAM usage. The host creates the qcow2 at VM start and
deletes it at stop. The guest formats it as ext4 with discard support.

Adds rootOverlay option (type: qcow2/tmpfs, size: default 10G) with
tmpfs available as fallback for the original behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 12:58:02 +00:00
67e7a1b2c3 feat(nix-module): user-specified network interface names
Change network.interfaces from list to attrset where keys become
guest-visible interface names. Names are passed to the guest via
vmsilo.ifname=<name>,<mac> kernel parameters and applied at early
boot via udev rules.

- Add sortedInterfaceList helper for deterministic PCI slot assignment
- Update all interface iteration to use sorted attrset
- Add vmsilo-ifname-rules initrd service to generate udev rules
- MAC addresses now generated from vmName-userIfName hash

BREAKING: network.interfaces syntax changes from list to attrset:
  Before: interfaces = [{ type = "tap"; ... }];
  After:  interfaces = { wan = { type = "tap"; ... }; };

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-13 23:33:13 +00:00
fc9f423ed3 feat(vm-switch): Remove TSO/GSO and CSUM
Remove TSO/GSO/CSUM feature advertisement and handle virtio-net headers
at the vhost backend boundary:
- Strip 12-byte virtio-net header at TX before forwarding
- Prepend zero virtio-net header at RX when delivering to guest
- Add --mtu CLI flag for configurable frame size (default 1514)
- Update ring buffer to handle standard MTU-sized frames only

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-13 23:01:12 +00:00
794ae6e9fd docs: document guestConfig function pattern for interface names
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-13 22:52:44 +00:00
29f05f92aa chore: remove dead code, fix MAC generation bug, update docs
Remove unused Rust code found by review: EthernetFrame struct,
extract_dest_mac(), MAX_FRAME_SIZE, MAX_TSO_FRAME, PacketForwarder.our_ip,
EgressBuffer.routes, and redundant serde_json dev-dependency.

Fix MAC generation to use interface name (e.g. enp0s22) instead of stale
idx+16 offset that diverged when PCI slot was changed to idx+22.

Fix README.md: wrong socket path, non-existent vmNetwork.router option
(now receiveBroadcast/routes), stale PCI offset examples, incomplete
vm-switch networking example. Add --quiet flag to CLI docs in both
README.md and CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 20:26:33 +00:00
20cef7c314 docs: update CLAUDE.md for per-peer route support
Document routes config file, route table forwarding with LPM,
PutBuffer peer_routes field, net.rs source file, and NixOS
vmNetwork.routes option.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 19:29:50 +00:00
26168e95b8 refactor: remove VmRole, unify config files to client.ip/client.mac
The router/client role distinction in vm-switch is no longer meaningful
with L3 IP-based forwarding. All VMs now use client.ip/client.mac/
client.sock uniformly; the only behavioral difference is the
receive_broadcast flag file.

- Remove VmRole enum and all role-based logic from Rust code
- Rename NixOS option vmNetwork.router to vmNetwork.receiveBroadcast
- Remove "exactly one router per network" assertion
- Update bufferbloat-test, documentation, and all tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 19:06:26 +00:00
281f12f8c0 docs: update CLAUDE.md for per-flow metrics
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 18:24:06 +00:00
a052846e4a fix(vm-switch): prevent worker kills from egress mutex contention
Under heavy TSO traffic, the vhost TX thread holds the egress mutex
while processing descriptor bursts. The child event loop blocks on
this mutex just to build the pollfd set, delaying Ping/Pong responses
past the 100ms deadline.

Add a fast-path control message check at the top of the event loop
that polls just the control fd (zero-timeout, no mutex) before
acquiring forwarder locks. Also remove the GetMetrics timeout kill
since the ping heartbeat is sufficient for liveness detection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 16:52:00 +00:00
3f049f41fe feat(nix-module): expose FQ-CoDel tunables in vm-switch options
Replace removed initialCredits with bufferSize and add fqCodelTarget,
fqCodelInterval, fqCodelLimit, fqCodelQuantum options.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 16:52:00 +00:00
c6af0305fe docs: update CLAUDE.md for FQ-CoDel integration
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 16:52:00 +00:00
1f5acb64fa Update KWin patches to draw borders on popup windows 2026-02-13 16:47:40 +00:00