- Split pve_ha_service_info into _config and _status to avoid stale series - Handle wearout "N/A" and health "UNKNOWN" edge cases for physical disks - Clarify node label convention and rootfs available vs free naming - Note QEMU-only scope for VM pressure (LXC lacks PSI in PVE API) - Add full node_status/lrm_status examples showing all cluster nodes - Document mutex-guarded nodes pattern and test fixture requirements Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
10 KiB
Remaining Collectors Design Spec
Add 4 new collectors to pve-exporter covering the TODO items from the README: node status, VM pressure, HA status, and physical disks. SDN is excluded (config-only, not operationally useful). Kernel version and CPU model are excluded from node status (static, low value).
Collectors
1. Node Status Collector
File: collector/node_status.go
API: /nodes/{node}/status (per-node fan-out)
Type: NodeAwareCollector
| Metric | Type | Labels | Description |
|---|---|---|---|
pve_node_load1 |
Gauge | node |
1-minute load average |
pve_node_load5 |
Gauge | node |
5-minute load average |
pve_node_load15 |
Gauge | node |
15-minute load average |
pve_node_swap_total_bytes |
Gauge | node |
Total swap in bytes |
pve_node_swap_used_bytes |
Gauge | node |
Used swap in bytes |
pve_node_swap_free_bytes |
Gauge | node |
Free swap in bytes |
pve_node_rootfs_total_bytes |
Gauge | node |
Root filesystem total in bytes |
pve_node_rootfs_used_bytes |
Gauge | node |
Root filesystem used in bytes |
pve_node_rootfs_available_bytes |
Gauge | node |
Root filesystem available in bytes |
pve_node_ksm_shared_bytes |
Gauge | node |
KSM shared memory in bytes |
pve_node_boot_mode_info |
Gauge | node, mode, secureboot |
Boot mode info (always 1) |
The pve_node_ prefix disambiguates these from cluster_resources metrics which use
id labels like node/node01. The node label here is the bare node name (e.g.,
node01), consistent with how other NodeAwareCollectors label per-node data.
Rootfs uses available (not free) because the API field avail reflects usable
space after reserved blocks, which is the operationally relevant value. Swap has no
reserved blocks so free is correct there.
API response structure:
{
"data": {
"loadavg": ["3.12", "2.88", "2.79"],
"swap": {"total": 8589930496, "used": 0, "free": 8589930496},
"rootfs": {"used": 28747304960, "total": 100861726720, "avail": 66943684608},
"ksm": {"shared": 0},
"boot-info": {"mode": "efi", "secureboot": 0}
}
}
Load averages are strings in the API, parse with strconv.ParseFloat.
secureboot label: "1" or "0" (string, not bool).
2. VM Pressure Collector
File: collector/vm_pressure.go
API: /nodes/{node}/qemu (per-node fan-out)
Type: NodeAwareCollector
| Metric | Type | Labels | Description |
|---|---|---|---|
pve_vm_pressure_cpu_some_ratio |
Gauge | id, node |
CPU pressure (some) |
pve_vm_pressure_cpu_full_ratio |
Gauge | id, node |
CPU pressure (full) |
pve_vm_pressure_memory_some_ratio |
Gauge | id, node |
Memory pressure (some) |
pve_vm_pressure_memory_full_ratio |
Gauge | id, node |
Memory pressure (full) |
pve_vm_pressure_io_some_ratio |
Gauge | id, node |
I/O pressure (some) |
pve_vm_pressure_io_full_ratio |
Gauge | id, node |
I/O pressure (full) |
API response structure (per VM entry):
{
"vmid": 112,
"status": "running",
"pressurecpusome": 0,
"pressurecpufull": 0,
"pressurememorysome": 0,
"pressurememoryfull": 0,
"pressureiosome": 0,
"pressureiofull": 0
}
- Only emit metrics for running VMs (stopped VMs lack pressure fields).
- QEMU only — LXC containers run in the host kernel namespace and do not expose per-container PSI metrics through the PVE API.
idlabel: constructed asfmt.Sprintf("qemu/%d", vmid)to match the existing convention used bycluster_resourcesandnode_configcollectors.- API returns values 0–100. Divide by 100 to produce 0.0–1.0 ratios.
3. HA Status Collector
File: collector/ha_status.go
API: /cluster/ha/status/manager_status + /cluster/ha/resources (cluster-level, no per-node fan-out)
Type: Plain Collector (not NodeAwareCollector). Both API calls happen inside a
single Update() method.
| Metric | Type | Labels | Description |
|---|---|---|---|
pve_ha_crm_master |
Gauge | node |
1 if node is CRM master, 0 otherwise (all nodes) |
pve_ha_node_status |
Gauge | node, status |
Per-node HA status (always 1) |
pve_ha_lrm_timestamp_seconds |
Gauge | node |
Last LRM heartbeat as unix timestamp |
pve_ha_lrm_mode |
Gauge | node, mode |
LRM mode per node (always 1) |
pve_ha_service_config |
Gauge | sid, type, max_restart, max_relocate, failback |
Service config (always 1) |
pve_ha_service_status |
Gauge | sid, node, state |
Service runtime state (always 1) |
Service metrics are split into _config (from /cluster/ha/resources, static labels)
and _status (from manager_status.service_status, runtime labels that change). This
avoids stale series when a service migrates between nodes or changes state.
API response structure (/cluster/ha/status/manager_status):
{
"data": {
"manager_status": {
"master_node": "node03",
"node_status": {
"node01": "online",
"node02": "online",
"node03": "online",
"node04": "online",
"node05": "online"
},
"service_status": {
"vm:106": {"node": "node04", "running": 1, "state": "started"}
}
},
"lrm_status": {
"node01": {"mode": "active", "state": "wait_for_agent_lock", "timestamp": 1774016351},
"node02": {"mode": "active", "state": "wait_for_agent_lock", "timestamp": 1774016351},
"node03": {"mode": "active", "state": "wait_for_agent_lock", "timestamp": 1774016351},
"node04": {"mode": "active", "state": "active", "timestamp": 1774016350},
"node05": {"mode": "active", "state": "wait_for_agent_lock", "timestamp": 1774016351}
}
}
}
API response structure (/cluster/ha/resources):
{
"data": [
{"sid": "vm:106", "type": "vm", "state": "started", "max_restart": 2, "max_relocate": 2, "failback": 1}
]
}
pve_ha_crm_master: iteratenode_statuskeys frommanager_status, emit 1 formaster_node, 0 for all others. Thenode_statusmap contains all cluster nodes.pve_ha_service_config: from/cluster/ha/resources. Numeric config values (max_restart,max_relocate,failback) become string labels.pve_ha_service_status: frommanager_status.service_status. Thestatelabel reflects runtime state (e.g.,started,stopped,migrate).
4. Physical Disks Collector
File: collector/physical_disk.go
API: /nodes/{node}/disks/list (per-node fan-out)
Type: NodeAwareCollector
| Metric | Type | Labels | Description |
|---|---|---|---|
pve_physical_disk_health |
Gauge | node, devpath, model, serial, type |
1 if SMART PASSED, 0 otherwise |
pve_physical_disk_wearout_remaining_ratio |
Gauge | node, devpath |
Wearout remaining 0.0–1.0 |
pve_physical_disk_size_bytes |
Gauge | node, devpath |
Disk size in bytes |
pve_physical_disk_info |
Gauge | node, devpath, model, serial, type, used |
Disk info (always 1) |
pve_physical_disk_osd |
Gauge | node, devpath, osd |
Disk-to-OSD mapping (always 1, one per OSD) |
API response structure (per disk entry):
{
"devpath": "/dev/nvme0n1",
"health": "PASSED",
"wearout": 100,
"size": 7681501126656,
"model": "VV007680KYFFL",
"serial": "ADD3NA317I0104K2N",
"type": "nvme",
"used": "LVM",
"osdid": "8",
"osdid-list": ["8"]
}
wearout: API returns 0–100 integer representing percentage remaining (100 = new, 0 = fully worn). Divide by 100 to get ratio directly (no inversion needed). The API may return"N/A"(string) for disks that don't support wear leveling — usejson.Numberor similar to handle this. Skip emittingpve_physical_disk_wearout_remaining_ratiowhen the value is not a valid number.health: compare string to"PASSED", emit 1 or 0. Ifhealthis empty or"UNKNOWN", emit 0.osdlabel: format as"osd.N"(e.g.,osd.8) matching Ceph daemon naming convention.- Multi-OSD disks: emit one
pve_physical_disk_osdentry per item inosdid-list. - Non-OSD disks (
osdidis -1 /osdid-listis null): nopve_physical_disk_osdentry emitted.
Implementation Pattern
All 4 collectors follow the established patterns:
- Self-register via
init()+registerCollector() - NodeAwareCollector interface for per-node endpoints (node_status, vm_pressure, physical_disk)
- Plain Collector for cluster-level endpoints (ha_status)
- NodeAwareCollector implementations guard the
nodesslice withsync.Mutex, copy it at the start ofUpdate(), matching the pattern innode_config.goandreplication.go - Per-node fan-out uses
sync.WaitGroup+ semaphore fromclient.MaxConcurrent() - JSON response types as unexported structs in each collector file
- Unit tests with JSON fixtures in
collector/fixtures/ testCollectorAdapterpattern fortestutil.GatherAndCompare
Test Fixtures
Each collector needs fixture files in collector/fixtures/:
node_status.json— full/nodes/{node}/statusresponsenode_qemu_pressure.json—/nodes/{node}/qemuresponse with running + stopped VMs (reuse or extend existingnode_qemu.jsonif pressure fields can be added)ha_manager_status.json—/cluster/ha/status/manager_statusresponseha_resources.json—/cluster/ha/resourcesresponsenode_disks.json—/nodes/{node}/disks/listresponse
The HA collector test needs two routes mapped in the test server (one per endpoint), unlike NodeAwareCollector tests which share a single route pattern.
Scope Exclusions
- SDN/Network: Excluded — API exposes config only, not operational state.
- Kernel version info: Excluded — static, low operational value.
- CPU model info: Excluded — static, low operational value.
README Update
Remove the implemented items from the TODO section. Remove the SDN, kernel version, and CPU model entries entirely. If no TODO items remain, remove the section. Add metrics tables for all 4 new collectors following the existing table format.