Add pve-exporter design spec

Full design for a Go Prometheus exporter for Proxmox VE, replacing
the Python prometheus-pve-exporter with corosync metrics added.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Davíð Steinn Geirsson 2026-03-20 10:59:14 +00:00
parent 4aa8a6d579
commit 154a46f3cf

View file

@ -0,0 +1,287 @@
# pve-exporter Design Spec
A Prometheus exporter for Proxmox VE written in Go. Replaces the Python
prometheus-pve-exporter with a single static binary, matching all existing
metric names for dashboard compatibility, and adding corosync cluster metrics.
## Goals
- Drop-in metric compatibility with prometheus-pve-exporter (same metric names
and labels where possible) so existing Grafana dashboards work unchanged
- Add corosync/quorum metrics not available in the Python exporter
- Single statically-linked binary for easy deployment via Ansible
- Cluster-wide scrape from a single instance (no per-node exporter deployment)
## Non-Goals
- Ceph metrics (collected separately via ceph-mgr)
- General-purpose PVE API client library
- Full parity with PVE's web UI
## Architecture
### Project Structure
```
pve-exporter/
├── main.go # Entry point, flag parsing, HTTP server
├── collector/
│ ├── collector.go # Collector interface, registry, PVECollector
│ ├── client.go # PVE API client (HTTP, auth, JSON parsing)
│ ├── cluster_status.go # pve_up, pve_node_info, pve_cluster_info
│ ├── cluster_resources.go # CPU, memory, disk, network, storage, guest/HA info
│ ├── corosync.go # pve_cluster_quorate, nodes_total, expected_votes, node_online
│ ├── version.go # pve_version_info
│ ├── backup.go # pve_not_backed_up_*
│ ├── node_config.go # pve_onboot_status
│ ├── replication.go # pve_replication_*
│ └── subscription.go # pve_subscription_*
├── go.mod
├── go.sum
├── Makefile
└── README.md
```
### Collector Framework
Follows the node_exporter pattern:
```go
type Collector interface {
Update(client *Client, ch chan<- prometheus.Metric) error
}
```
Collectors self-register via `init()` + `registerCollector()`. The framework
runs all collectors in parallel (goroutines + WaitGroup) and emits per-collector
scrape duration and success metrics automatically.
Collectors that need the node list or shared `/cluster/resources` data implement
additional interfaces:
```go
type NodeAwareCollector interface {
Collector
SetNodes(nodes []string)
}
type ResourceAwareCollector interface {
Collector
SetResources(data []byte)
}
```
### Scrape Flow
1. Prometheus hits `/metrics`
2. `PVECollector.Collect()` fetches `/cluster/resources` first (needed by
multiple collectors and provides the node list)
3. Node list and resources data are passed to collectors that need them
4. All collectors run in parallel
5. Per-node API calls within collectors (subscription, replication, node_config)
are parallelized across nodes with bounded concurrency (5 concurrent requests)
6. Framework measures duration, catches errors, emits scrape meta-metrics
### API Client
```go
type Client struct {
httpClient *http.Client
hosts []string // tried in order on failure
token string // PVEAPIToken=user@realm!tokenid=uuid
}
```
- Tries hosts in order; on connection/HTTP error, falls through to next host.
Remembers the last working host and tries it first on subsequent scrapes.
- 1-second TCP connect timeout for fast failover to next host.
- TLS certificate verification enabled by default. `--pve.tls-insecure` to
disable.
- Single `Get(path string) ([]byte, error)` method. No caching; each scrape
makes fresh API calls.
- Context-aware with scrape timeout propagated from Prometheus.
### Authentication
- `--pve.api-token` flag or `PVE_API_TOKEN` env var for token string
- `--pve.token-file` for reading token from file at startup (avoids token in
process list, Ansible-friendly)
- Sent as `Authorization: PVEAPIToken=...` header
## CLI & HTTP
```
pve-exporter \
--pve.host=https://node02:8006 \
--pve.host=https://node01:8006 \
--pve.token-file=/etc/pve-exporter/apikey \
--web.listen-address=:9221 \
--web.telemetry-path=/metrics
```
| Flag | Default | Description |
|------|---------|-------------|
| `--pve.host` | (required, repeatable) | PVE API base URLs, tried in order |
| `--pve.api-token` | — | API token string (mutually exclusive with token-file) |
| `--pve.token-file` | — | Path to file containing API token |
| `--pve.tls-insecure` | `false` | Disable TLS certificate verification |
| `--web.listen-address` | `:9221` | Address to listen on |
| `--web.telemetry-path` | `/metrics` | Path for metrics endpoint |
| `--log.level` | `info` | Log level (debug, info, warn, error) |
| `--log.format` | `logfmt` | Log format (logfmt, json) |
HTTP endpoints:
- `/metrics` — Prometheus metrics
- `/` — Landing page with link to metrics
Port 9221 matches the Python exporter for drop-in compatibility.
## Metrics
All metrics use namespace `pve`.
### cluster_status collector
API: `/cluster/status`
| Metric | Type | Labels |
|--------|------|--------|
| `pve_up` | Gauge | `id` |
| `pve_node_info` | Gauge | `id`, `level`, `name`, `nodeid` |
| `pve_cluster_info` | Gauge | `id`, `nodes`, `quorate`, `version` |
### corosync collector
API: `/cluster/status`, `/cluster/config/nodes`
| Metric | Type | Labels |
|--------|------|--------|
| `pve_cluster_quorate` | Gauge | — |
| `pve_cluster_nodes_total` | Gauge | — |
| `pve_cluster_expected_votes` | Gauge | — |
| `pve_node_online` | Gauge | `name`, `nodeid` |
### cluster_resources collector
API: `/cluster/resources`
| Metric | Type | Labels |
|--------|------|--------|
| `pve_cpu_usage_ratio` | Gauge | `id` |
| `pve_cpu_usage_limit` | Gauge | `id` |
| `pve_memory_usage_bytes` | Gauge | `id` |
| `pve_memory_size_bytes` | Gauge | `id` |
| `pve_disk_usage_bytes` | Gauge | `id` |
| `pve_disk_size_bytes` | Gauge | `id` |
| `pve_network_transmit_bytes_total` | Counter | `id` |
| `pve_network_receive_bytes_total` | Counter | `id` |
| `pve_disk_written_bytes_total` | Counter | `id` |
| `pve_disk_read_bytes_total` | Counter | `id` |
| `pve_uptime_seconds` | Gauge | `id` |
| `pve_storage_shared` | Gauge | `id` |
| `pve_guest_info` | Gauge | `id`, `node`, `name`, `type`, `template`, `tags` |
| `pve_storage_info` | Gauge | `id`, `node`, `storage`, `plugintype`, `content` |
| `pve_ha_state` | Gauge | `id`, `state` |
| `pve_lock_state` | Gauge | `id`, `state` |
### version collector
API: `/version`
| Metric | Type | Labels |
|--------|------|--------|
| `pve_version_info` | Gauge | `release`, `repoid`, `version` |
### backup collector
API: `/cluster/backup-info/not-backed-up`
| Metric | Type | Labels |
|--------|------|--------|
| `pve_not_backed_up_total` | Gauge | `id` |
| `pve_not_backed_up_info` | Gauge | `id` |
### node_config collector
API: `/nodes/{node}/qemu/{vmid}/config`, `/nodes/{node}/lxc/{vmid}/config`
| Metric | Type | Labels |
|--------|------|--------|
| `pve_onboot_status` | Gauge | `id`, `node`, `type` |
### replication collector
API: `/nodes/{node}/replication`
| Metric | Type | Labels |
|--------|------|--------|
| `pve_replication_info` | Gauge | `id`, `type`, `source`, `target`, `guest` |
| `pve_replication_duration_seconds` | Gauge | `id` |
| `pve_replication_last_sync_timestamp_seconds` | Gauge | `id` |
| `pve_replication_last_try_timestamp_seconds` | Gauge | `id` |
| `pve_replication_next_sync_timestamp_seconds` | Gauge | `id` |
| `pve_replication_failed_syncs` | Gauge | `id` |
### subscription collector
API: `/nodes/{node}/subscription`
| Metric | Type | Labels |
|--------|------|--------|
| `pve_subscription_info` | Gauge | `id`, `level` |
| `pve_subscription_status` | Gauge | `id`, `status` |
| `pve_subscription_next_due_timestamp_seconds` | Gauge | `id` |
### Scrape meta-metrics
| Metric | Type | Labels |
|--------|------|--------|
| `pve_scrape_collector_duration_seconds` | Gauge | `collector` |
| `pve_scrape_collector_success` | Gauge | `collector` |
## Dependencies
- `github.com/alecthomas/kingpin/v2` — CLI flags
- `github.com/prometheus/client_golang` — Prometheus client
- `github.com/prometheus/common` — logging (promslog)
- `github.com/prometheus/exporter-toolkit` — TLS, web config, landing page
## Testing Strategy
- Unit tests per collector with mock API responses (JSON fixtures)
- Integration test: start exporter, scrape `/metrics`, verify expected metric
names and labels are present
- Manual validation against live PVE cluster
## Future Metrics (TODO)
The following metrics are available from the PVE API but deferred to future work:
### Per-node detailed status (`/nodes/{node}/status`)
- Load averages (1m, 5m, 15m)
- Swap usage (total, used, free)
- Root filesystem usage (total, used, available)
- KSM shared memory
- Kernel version info
- Boot mode and secure boot status
- CPU model info (model, sockets, cores, MHz)
### Per-VM pressure metrics (`/nodes/{node}/qemu`)
- `pressurecpusome`, `pressurecpufull`
- `pressurememorysome`, `pressurememoryfull`
- `pressureiosome`, `pressureiofull`
### HA detailed status (`/cluster/ha/status/current`)
- CRM master node and status
- Per-node LRM status (idle/active) and timestamps
- Per-service HA config (failback, max_restart, max_relocate)
### Physical disks (`/nodes/{node}/disks/list`)
- Disk health (SMART status)
- Wearout level
- Size and model info
- OSD mapping
### SDN/Network (`/cluster/resources` type=sdn)
- Zone status per node
- Zone type info