docs: update README with new collector metrics, remove TODO section
Add metrics tables for node_status, vm_pressure, ha_status, and physical_disk collectors. Remove the TODO section as all planned metrics are now implemented (SDN excluded by design). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
a88c696bfd
commit
771c3dc126
1 changed files with 48 additions and 33 deletions
81
README.md
81
README.md
|
|
@ -130,42 +130,57 @@ Create a PVE API token with at least `PVEAuditor` role. Provide it via:
|
|||
| `pve_subscription_status` | Gauge | `id`, `status` | Subscription status |
|
||||
| `pve_subscription_next_due_timestamp_seconds` | Gauge | `id` | Next due date as Unix timestamp |
|
||||
|
||||
### Node Status
|
||||
|
||||
| Metric | Type | Labels | Description |
|
||||
|--------|------|--------|-------------|
|
||||
| `pve_node_load1` | Gauge | `node` | 1-minute load average |
|
||||
| `pve_node_load5` | Gauge | `node` | 5-minute load average |
|
||||
| `pve_node_load15` | Gauge | `node` | 15-minute load average |
|
||||
| `pve_node_swap_total_bytes` | Gauge | `node` | Total swap in bytes |
|
||||
| `pve_node_swap_used_bytes` | Gauge | `node` | Used swap in bytes |
|
||||
| `pve_node_swap_free_bytes` | Gauge | `node` | Free swap in bytes |
|
||||
| `pve_node_rootfs_total_bytes` | Gauge | `node` | Root filesystem total in bytes |
|
||||
| `pve_node_rootfs_used_bytes` | Gauge | `node` | Root filesystem used in bytes |
|
||||
| `pve_node_rootfs_available_bytes` | Gauge | `node` | Root filesystem available in bytes |
|
||||
| `pve_node_ksm_shared_bytes` | Gauge | `node` | KSM shared memory in bytes |
|
||||
| `pve_node_boot_mode_info` | Gauge | `node`, `mode`, `secureboot` | Boot mode info (always 1) |
|
||||
|
||||
### VM Pressure
|
||||
|
||||
| Metric | Type | Labels | Description |
|
||||
|--------|------|--------|-------------|
|
||||
| `pve_vm_pressure_cpu_some_ratio` | Gauge | `id`, `node` | CPU pressure (some) |
|
||||
| `pve_vm_pressure_cpu_full_ratio` | Gauge | `id`, `node` | CPU pressure (full) |
|
||||
| `pve_vm_pressure_memory_some_ratio` | Gauge | `id`, `node` | Memory pressure (some) |
|
||||
| `pve_vm_pressure_memory_full_ratio` | Gauge | `id`, `node` | Memory pressure (full) |
|
||||
| `pve_vm_pressure_io_some_ratio` | Gauge | `id`, `node` | I/O pressure (some) |
|
||||
| `pve_vm_pressure_io_full_ratio` | Gauge | `id`, `node` | I/O pressure (full) |
|
||||
|
||||
### HA Status
|
||||
|
||||
| Metric | Type | Labels | Description |
|
||||
|--------|------|--------|-------------|
|
||||
| `pve_ha_crm_master` | Gauge | `node` | 1 if node is CRM master, 0 otherwise |
|
||||
| `pve_ha_node_status` | Gauge | `node`, `status` | Per-node HA status (always 1) |
|
||||
| `pve_ha_lrm_timestamp_seconds` | Gauge | `node` | Last LRM heartbeat as Unix timestamp |
|
||||
| `pve_ha_lrm_mode` | Gauge | `node`, `mode` | LRM mode per node (always 1) |
|
||||
| `pve_ha_service_config` | Gauge | `sid`, `type`, `max_restart`, `max_relocate`, `failback` | Service config (always 1) |
|
||||
| `pve_ha_service_status` | Gauge | `sid`, `node`, `state` | Service runtime state (always 1) |
|
||||
|
||||
### Physical Disks
|
||||
|
||||
| Metric | Type | Labels | Description |
|
||||
|--------|------|--------|-------------|
|
||||
| `pve_physical_disk_health` | Gauge | `node`, `devpath`, `model`, `serial`, `type` | 1 if SMART PASSED, 0 otherwise |
|
||||
| `pve_physical_disk_wearout_remaining_ratio` | Gauge | `node`, `devpath` | Wearout remaining (1.0 = new) |
|
||||
| `pve_physical_disk_size_bytes` | Gauge | `node`, `devpath` | Disk size in bytes |
|
||||
| `pve_physical_disk_info` | Gauge | `node`, `devpath`, `model`, `serial`, `type`, `used` | Disk info (always 1) |
|
||||
| `pve_physical_disk_osd` | Gauge | `node`, `devpath`, `osd` | Disk-to-OSD mapping (always 1) |
|
||||
|
||||
### Scrape Meta
|
||||
|
||||
| Metric | Type | Labels | Description |
|
||||
|--------|------|--------|-------------|
|
||||
| `pve_scrape_collector_duration_seconds` | Gauge | `collector` | Scrape duration per collector |
|
||||
| `pve_scrape_collector_success` | Gauge | `collector` | 1 if collector succeeded |
|
||||
|
||||
## TODO: Future Metrics
|
||||
|
||||
The following metrics are available from the PVE API but not yet implemented:
|
||||
|
||||
### Per-node detailed status (`/nodes/{node}/status`)
|
||||
- Load averages (1m, 5m, 15m)
|
||||
- Swap usage (total, used, free)
|
||||
- Root filesystem usage (total, used, available)
|
||||
- KSM shared memory
|
||||
- Kernel version info
|
||||
- Boot mode and secure boot status
|
||||
- CPU model info (model, sockets, cores, MHz)
|
||||
|
||||
### Per-VM pressure metrics (`/nodes/{node}/qemu`)
|
||||
- `pressurecpusome`, `pressurecpufull`
|
||||
- `pressurememorysome`, `pressurememoryfull`
|
||||
- `pressureiosome`, `pressureiofull`
|
||||
|
||||
### HA detailed status (`/cluster/ha/status/current`)
|
||||
- CRM master node and status
|
||||
- Per-node LRM status (idle/active) and timestamps
|
||||
- Per-service HA config (failback, max_restart, max_relocate)
|
||||
|
||||
### Physical disks (`/nodes/{node}/disks/list`)
|
||||
- Disk health (SMART status)
|
||||
- Wearout level
|
||||
- Size and model info
|
||||
- OSD mapping
|
||||
|
||||
### SDN/Network (`/cluster/resources` type=sdn)
|
||||
- Zone status per node
|
||||
- Zone type info
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue