docs: address spec review feedback

- Split pve_ha_service_info into _config and _status to avoid stale series
- Handle wearout "N/A" and health "UNKNOWN" edge cases for physical disks
- Clarify node label convention and rootfs available vs free naming
- Note QEMU-only scope for VM pressure (LXC lacks PSI in PVE API)
- Add full node_status/lrm_status examples showing all cluster nodes
- Document mutex-guarded nodes pattern and test fixture requirements

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Davíð Steinn Geirsson 2026-03-20 15:10:22 +00:00
parent b4ed302009
commit e10156323b

View file

@ -27,6 +27,14 @@ not operationally useful). Kernel version and CPU model are excluded from node s
| `pve_node_ksm_shared_bytes` | Gauge | `node` | KSM shared memory in bytes |
| `pve_node_boot_mode_info` | Gauge | `node`, `mode`, `secureboot` | Boot mode info (always 1) |
The `pve_node_` prefix disambiguates these from `cluster_resources` metrics which use
`id` labels like `node/node01`. The `node` label here is the bare node name (e.g.,
`node01`), consistent with how other NodeAwareCollectors label per-node data.
Rootfs uses `available` (not `free`) because the API field `avail` reflects usable
space after reserved blocks, which is the operationally relevant value. Swap has no
reserved blocks so `free` is correct there.
**API response structure:**
```json
{
@ -73,14 +81,18 @@ Load averages are strings in the API, parse with `strconv.ParseFloat`.
```
- Only emit metrics for running VMs (stopped VMs lack pressure fields).
- `id` label follows existing convention: `"qemu/{vmid}"` (e.g., `qemu/112`).
- QEMU only — LXC containers run in the host kernel namespace and do not expose
per-container PSI metrics through the PVE API.
- `id` label: constructed as `fmt.Sprintf("qemu/%d", vmid)` to match the existing
convention used by `cluster_resources` and `node_config` collectors.
- API returns values 0100. Divide by 100 to produce 0.01.0 ratios.
### 3. HA Status Collector
**File:** `collector/ha_status.go`
**API:** `/cluster/ha/status/manager_status` + `/cluster/ha/resources` (cluster-level, no per-node fan-out)
**Type:** Plain Collector (not NodeAwareCollector)
**Type:** Plain Collector (not NodeAwareCollector). Both API calls happen inside a
single `Update()` method.
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
@ -88,7 +100,12 @@ Load averages are strings in the API, parse with `strconv.ParseFloat`.
| `pve_ha_node_status` | Gauge | `node`, `status` | Per-node HA status (always 1) |
| `pve_ha_lrm_timestamp_seconds` | Gauge | `node` | Last LRM heartbeat as unix timestamp |
| `pve_ha_lrm_mode` | Gauge | `node`, `mode` | LRM mode per node (always 1) |
| `pve_ha_service_info` | Gauge | `sid`, `type`, `node`, `state`, `max_restart`, `max_relocate`, `failback` | Service config + runtime (always 1) |
| `pve_ha_service_config` | Gauge | `sid`, `type`, `max_restart`, `max_relocate`, `failback` | Service config (always 1) |
| `pve_ha_service_status` | Gauge | `sid`, `node`, `state` | Service runtime state (always 1) |
Service metrics are split into `_config` (from `/cluster/ha/resources`, static labels)
and `_status` (from `manager_status.service_status`, runtime labels that change). This
avoids stale series when a service migrates between nodes or changes state.
**API response structure (`/cluster/ha/status/manager_status`):**
```json
@ -96,13 +113,23 @@ Load averages are strings in the API, parse with `strconv.ParseFloat`.
"data": {
"manager_status": {
"master_node": "node03",
"node_status": {"node01": "online", "node02": "online"},
"node_status": {
"node01": "online",
"node02": "online",
"node03": "online",
"node04": "online",
"node05": "online"
},
"service_status": {
"vm:106": {"node": "node04", "running": 1, "state": "started"}
}
},
"lrm_status": {
"node01": {"mode": "active", "state": "wait_for_agent_lock", "timestamp": 1774016351}
"node01": {"mode": "active", "state": "wait_for_agent_lock", "timestamp": 1774016351},
"node02": {"mode": "active", "state": "wait_for_agent_lock", "timestamp": 1774016351},
"node03": {"mode": "active", "state": "wait_for_agent_lock", "timestamp": 1774016351},
"node04": {"mode": "active", "state": "active", "timestamp": 1774016350},
"node05": {"mode": "active", "state": "wait_for_agent_lock", "timestamp": 1774016351}
}
}
}
@ -117,8 +144,12 @@ Load averages are strings in the API, parse with `strconv.ParseFloat`.
}
```
- `pve_ha_crm_master`: iterate `node_status` keys, emit 1 for `master_node`, 0 for all others.
- `pve_ha_service_info`: merge runtime state from `service_status` (node, state) with config from `/cluster/ha/resources` (max_restart, max_relocate, failback). Numeric config values become string labels.
- `pve_ha_crm_master`: iterate `node_status` keys from `manager_status`, emit 1 for
`master_node`, 0 for all others. The `node_status` map contains all cluster nodes.
- `pve_ha_service_config`: from `/cluster/ha/resources`. Numeric config values
(`max_restart`, `max_relocate`, `failback`) become string labels.
- `pve_ha_service_status`: from `manager_status.service_status`. The `state` label
reflects runtime state (e.g., `started`, `stopped`, `migrate`).
### 4. Physical Disks Collector
@ -150,8 +181,13 @@ Load averages are strings in the API, parse with `strconv.ParseFloat`.
}
```
- `wearout`: API returns 0100 integer (100 = new). Divide by 100 for ratio.
- `health`: compare string to `"PASSED"`, emit 1 or 0.
- `wearout`: API returns 0100 integer representing percentage remaining (100 = new,
0 = fully worn). Divide by 100 to get ratio directly (no inversion needed). The API
may return `"N/A"` (string) for disks that don't support wear leveling — use
`json.Number` or similar to handle this. Skip emitting `pve_physical_disk_wearout_remaining_ratio`
when the value is not a valid number.
- `health`: compare string to `"PASSED"`, emit 1 or 0. If `health` is empty or
`"UNKNOWN"`, emit 0.
- `osd` label: format as `"osd.N"` (e.g., `osd.8`) matching Ceph daemon naming convention.
- Multi-OSD disks: emit one `pve_physical_disk_osd` entry per item in `osdid-list`.
- Non-OSD disks (`osdid` is -1 / `osdid-list` is null): no `pve_physical_disk_osd` entry emitted.
@ -163,11 +199,27 @@ All 4 collectors follow the established patterns:
- Self-register via `init()` + `registerCollector()`
- NodeAwareCollector interface for per-node endpoints (node_status, vm_pressure, physical_disk)
- Plain Collector for cluster-level endpoints (ha_status)
- NodeAwareCollector implementations guard the `nodes` slice with `sync.Mutex`, copy
it at the start of `Update()`, matching the pattern in `node_config.go` and `replication.go`
- Per-node fan-out uses `sync.WaitGroup` + semaphore from `client.MaxConcurrent()`
- JSON response types as unexported structs in each collector file
- Unit tests with JSON fixtures in `collector/fixtures/`
- `testCollectorAdapter` pattern for `testutil.GatherAndCompare`
### Test Fixtures
Each collector needs fixture files in `collector/fixtures/`:
- `node_status.json` — full `/nodes/{node}/status` response
- `node_qemu_pressure.json``/nodes/{node}/qemu` response with running + stopped VMs
(reuse or extend existing `node_qemu.json` if pressure fields can be added)
- `ha_manager_status.json``/cluster/ha/status/manager_status` response
- `ha_resources.json``/cluster/ha/resources` response
- `node_disks.json``/nodes/{node}/disks/list` response
The HA collector test needs two routes mapped in the test server (one per endpoint),
unlike NodeAwareCollector tests which share a single route pattern.
## Scope Exclusions
- **SDN/Network**: Excluded — API exposes config only, not operational state.
@ -177,4 +229,5 @@ All 4 collectors follow the established patterns:
## README Update
Remove the implemented items from the TODO section. Remove the SDN, kernel version,
and CPU model entries entirely. If no TODO items remain, remove the section.
and CPU model entries entirely. If no TODO items remain, remove the section. Add
metrics tables for all 4 new collectors following the existing table format.