docs: add README with usage, metrics reference, and future metrics TODO

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:38:17 +00:00 · 2026-03-20 11:38:17 +00:00 · 56fe551700
commit 56fe551700
parent 3bafb67aa0
1 changed files with 171 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,171 @@
+# pve-exporter
+
+A Prometheus exporter for Proxmox VE written in Go. Produces a single static
+binary for easy deployment.
+
+Designed as a drop-in replacement for
+[prometheus-pve-exporter](https://github.com/prometheus-community/prometheus-pve-exporter)
+with matching metric names for dashboard compatibility, plus additional
+corosync cluster metrics.
+
+## Installation
+
+```bash
+CGO_ENABLED=0 go build -o pve-exporter .
+```
+
+## Usage
+
+```bash
+pve-exporter \
+  --pve.host=https://node01:8006 \
+  --pve.host=https://node02:8006 \
+  --pve.token-file=/etc/pve-exporter/apikey \
+  --web.listen-address=:9221
+```
+
+The exporter scrapes all cluster data from a single PVE API endpoint. Multiple
+`--pve.host` values provide failover — hosts are tried in order, with a
+1-second connect timeout for fast failover.
+
+### Flags
+
+| Flag | Default | Description |
+|------|---------|-------------|
+| `--pve.host` | (required) | PVE API base URL (repeatable) |
+| `--pve.api-token` | | API token string (`user@realm!tokenid=uuid`) |
+| `--pve.token-file` | | Path to file containing API token |
+| `--pve.tls-insecure` | `false` | Disable TLS certificate verification |
+| `--pve.max-concurrent` | `5` | Max concurrent API requests for per-node fan-out |
+| `--web.listen-address` | `:9221` | Address to listen on |
+| `--web.telemetry-path` | `/metrics` | Path for metrics endpoint |
+| `--log.level` | `info` | Log level (debug, info, warn, error) |
+| `--log.format` | `logfmt` | Log format (logfmt, json) |
+
+### Authentication
+
+Create a PVE API token with at least `PVEAuditor` role. Provide it via:
+
+- `--pve.api-token=user@realm!tokenid=uuid` (visible in process list)
+- `--pve.token-file=/path/to/file` (recommended)
+- `PVE_API_TOKEN` environment variable
+
+`--pve.api-token` and `--pve.token-file` are mutually exclusive.
+
+## Metrics
+
+### Cluster Status
+
+| Metric | Type | Labels | Description |
+|--------|------|--------|-------------|
+| `pve_node_info` | Gauge | `id`, `level`, `name`, `nodeid` | Node info (always 1) |
+| `pve_cluster_info` | Gauge | `id`, `nodes`, `quorate`, `version` | Cluster info (always 1) |
+
+### Corosync
+
+| Metric | Type | Labels | Description |
+|--------|------|--------|-------------|
+| `pve_cluster_quorate` | Gauge | | 1 if cluster has quorum |
+| `pve_cluster_nodes_total` | Gauge | | Total node count |
+| `pve_cluster_expected_votes` | Gauge | | Sum of quorum votes from config |
+| `pve_node_online` | Gauge | `name`, `nodeid` | 1 if node is online |
+
+### Cluster Resources
+
+| Metric | Type | Labels | Description |
+|--------|------|--------|-------------|
+| `pve_up` | Gauge | `id` | 1 if node/VM/CT is online/running |
+| `pve_cpu_usage_ratio` | Gauge | `id` | CPU utilization ratio |
+| `pve_cpu_usage_limit` | Gauge | `id` | Number of available CPUs |
+| `pve_memory_usage_bytes` | Gauge | `id` | Used memory in bytes |
+| `pve_memory_size_bytes` | Gauge | `id` | Total memory in bytes |
+| `pve_disk_usage_bytes` | Gauge | `id` | Used disk space in bytes |
+| `pve_disk_size_bytes` | Gauge | `id` | Total disk space in bytes |
+| `pve_uptime_seconds` | Gauge | `id` | Uptime in seconds |
+| `pve_network_transmit_bytes_total` | Counter | `id` | Network bytes sent |
+| `pve_network_receive_bytes_total` | Counter | `id` | Network bytes received |
+| `pve_disk_written_bytes_total` | Counter | `id` | Disk bytes written |
+| `pve_disk_read_bytes_total` | Counter | `id` | Disk bytes read |
+| `pve_guest_info` | Gauge | `id`, `node`, `name`, `type`, `template`, `tags` | VM/CT info (always 1) |
+| `pve_storage_info` | Gauge | `id`, `node`, `storage`, `plugintype`, `content` | Storage info (always 1) |
+| `pve_storage_shared` | Gauge | `id` | 1 if storage is shared |
+| `pve_ha_state` | Gauge | `id`, `state` | HA service status |
+| `pve_lock_state` | Gauge | `id`, `state` | Guest config lock state |
+
+### Version
+
+| Metric | Type | Labels | Description |
+|--------|------|--------|-------------|
+| `pve_version_info` | Gauge | `release`, `repoid`, `version` | PVE version info (always 1) |
+
+### Backup
+
+| Metric | Type | Labels | Description |
+|--------|------|--------|-------------|
+| `pve_not_backed_up_total` | Gauge | `id` | 1 if guest has no backup job |
+| `pve_not_backed_up_info` | Gauge | `id` | 1 if guest has no backup job |
+
+### Node Config
+
+| Metric | Type | Labels | Description |
+|--------|------|--------|-------------|
+| `pve_onboot_status` | Gauge | `id`, `node`, `type` | VM/CT onboot config value |
+
+### Replication
+
+| Metric | Type | Labels | Description |
+|--------|------|--------|-------------|
+| `pve_replication_info` | Gauge | `id`, `type`, `source`, `target`, `guest` | Replication job info (always 1) |
+| `pve_replication_duration_seconds` | Gauge | `id` | Last replication duration |
+| `pve_replication_last_sync_timestamp_seconds` | Gauge | `id` | Last successful sync time |
+| `pve_replication_last_try_timestamp_seconds` | Gauge | `id` | Last sync attempt time |
+| `pve_replication_next_sync_timestamp_seconds` | Gauge | `id` | Next scheduled sync time |
+| `pve_replication_failed_syncs` | Gauge | `id` | Failed sync count |
+
+### Subscription
+
+| Metric | Type | Labels | Description |
+|--------|------|--------|-------------|
+| `pve_subscription_info` | Gauge | `id`, `level` | Subscription info (always 1) |
+| `pve_subscription_status` | Gauge | `id`, `status` | Subscription status |
+| `pve_subscription_next_due_timestamp_seconds` | Gauge | `id` | Next due date as Unix timestamp |
+
+### Scrape Meta
+
+| Metric | Type | Labels | Description |
+|--------|------|--------|-------------|
+| `pve_scrape_collector_duration_seconds` | Gauge | `collector` | Scrape duration per collector |
+| `pve_scrape_collector_success` | Gauge | `collector` | 1 if collector succeeded |
+
+## TODO: Future Metrics
+
+The following metrics are available from the PVE API but not yet implemented:
+
+### Per-node detailed status (`/nodes/{node}/status`)
+- Load averages (1m, 5m, 15m)
+- Swap usage (total, used, free)
+- Root filesystem usage (total, used, available)
+- KSM shared memory
+- Kernel version info
+- Boot mode and secure boot status
+- CPU model info (model, sockets, cores, MHz)
+
+### Per-VM pressure metrics (`/nodes/{node}/qemu`)
+- `pressurecpusome`, `pressurecpufull`
+- `pressurememorysome`, `pressurememoryfull`
+- `pressureiosome`, `pressureiofull`
+
+### HA detailed status (`/cluster/ha/status/current`)
+- CRM master node and status
+- Per-node LRM status (idle/active) and timestamps
+- Per-service HA config (failback, max_restart, max_relocate)
+
+### Physical disks (`/nodes/{node}/disks/list`)
+- Disk health (SMART status)
+- Wearout level
+- Size and model info
+- OSD mapping
+
+### SDN/Network (`/cluster/resources` type=sdn)
+- Zone status per node
+- Zone type info