pve-exporter
A Prometheus exporter for Proxmox VE written in Go. Produces a single static
binary for easy deployment.
Designed as a drop-in replacement for
prometheus-pve-exporter
with matching metric names for dashboard compatibility, plus additional
corosync cluster metrics.
Installation
CGO_ENABLED=0 go build -o pve-exporter .
Usage
pve-exporter \
--pve.host=https://node01:8006 \
--pve.host=https://node02:8006 \
--pve.token-file=/etc/pve-exporter/apikey \
--web.listen-address=:9221
The exporter scrapes all cluster data from a single PVE API endpoint. Multiple
--pve.host values provide failover — hosts are tried in order, with a
1-second connect timeout for fast failover.
Flags
| Flag |
Default |
Description |
--pve.host |
(required) |
PVE API base URL (repeatable) |
--pve.api-token |
|
API token string (user@realm!tokenid=uuid) |
--pve.token-file |
|
Path to file containing API token |
--pve.tls-insecure |
false |
Disable TLS certificate verification |
--pve.max-concurrent |
5 |
Max concurrent API requests for per-node fan-out |
--web.listen-address |
:9221 |
Address to listen on |
--web.telemetry-path |
/metrics |
Path for metrics endpoint |
--log.level |
info |
Log level (debug, info, warn, error) |
--log.format |
logfmt |
Log format (logfmt, json) |
Authentication
Create a PVE API token with at least PVEAuditor role. Provide it via:
--pve.api-token=user@realm!tokenid=uuid (visible in process list)
--pve.token-file=/path/to/file (recommended)
PVE_API_TOKEN environment variable
--pve.api-token and --pve.token-file are mutually exclusive.
Metrics
Cluster Status
| Metric |
Type |
Labels |
Description |
pve_node_info |
Gauge |
id, level, name, nodeid |
Node info (always 1) |
pve_cluster_info |
Gauge |
id, nodes, quorate, version |
Cluster info (always 1) |
Corosync
| Metric |
Type |
Labels |
Description |
pve_cluster_quorate |
Gauge |
|
1 if cluster has quorum |
pve_cluster_nodes_total |
Gauge |
|
Total node count |
pve_cluster_expected_votes |
Gauge |
|
Sum of quorum votes from config |
pve_node_online |
Gauge |
name, nodeid |
1 if node is online |
Cluster Resources
| Metric |
Type |
Labels |
Description |
pve_up |
Gauge |
id |
1 if node/VM/CT is online/running |
pve_cpu_usage_ratio |
Gauge |
id |
CPU utilization ratio |
pve_cpu_usage_limit |
Gauge |
id |
Number of available CPUs |
pve_memory_usage_bytes |
Gauge |
id |
Used memory in bytes |
pve_memory_size_bytes |
Gauge |
id |
Total memory in bytes |
pve_disk_usage_bytes |
Gauge |
id |
Used disk space in bytes |
pve_disk_size_bytes |
Gauge |
id |
Total disk space in bytes |
pve_uptime_seconds |
Gauge |
id |
Uptime in seconds |
pve_network_transmit_bytes_total |
Counter |
id |
Network bytes sent |
pve_network_receive_bytes_total |
Counter |
id |
Network bytes received |
pve_disk_written_bytes_total |
Counter |
id |
Disk bytes written |
pve_disk_read_bytes_total |
Counter |
id |
Disk bytes read |
pve_guest_info |
Gauge |
id, node, name, type, template, tags |
VM/CT info (always 1) |
pve_storage_info |
Gauge |
id, node, storage, plugintype, content |
Storage info (always 1) |
pve_storage_shared |
Gauge |
id |
1 if storage is shared |
pve_ha_state |
Gauge |
id, state |
HA service status |
pve_lock_state |
Gauge |
id, state |
Guest config lock state |
Version
| Metric |
Type |
Labels |
Description |
pve_version_info |
Gauge |
release, repoid, version |
PVE version info (always 1) |
Backup
| Metric |
Type |
Labels |
Description |
pve_not_backed_up_total |
Gauge |
id |
1 if guest has no backup job |
pve_not_backed_up_info |
Gauge |
id |
1 if guest has no backup job |
Node Config
| Metric |
Type |
Labels |
Description |
pve_onboot_status |
Gauge |
id, node, type |
VM/CT onboot config value |
Replication
| Metric |
Type |
Labels |
Description |
pve_replication_info |
Gauge |
id, type, source, target, guest |
Replication job info (always 1) |
pve_replication_duration_seconds |
Gauge |
id |
Last replication duration |
pve_replication_last_sync_timestamp_seconds |
Gauge |
id |
Last successful sync time |
pve_replication_last_try_timestamp_seconds |
Gauge |
id |
Last sync attempt time |
pve_replication_next_sync_timestamp_seconds |
Gauge |
id |
Next scheduled sync time |
pve_replication_failed_syncs |
Gauge |
id |
Failed sync count |
Subscription
| Metric |
Type |
Labels |
Description |
pve_subscription_info |
Gauge |
id, level |
Subscription info (always 1) |
pve_subscription_status |
Gauge |
id, status |
Subscription status |
pve_subscription_next_due_timestamp_seconds |
Gauge |
id |
Next due date as Unix timestamp |
Scrape Meta
| Metric |
Type |
Labels |
Description |
pve_scrape_collector_duration_seconds |
Gauge |
collector |
Scrape duration per collector |
pve_scrape_collector_success |
Gauge |
collector |
1 if collector succeeded |
TODO: Future Metrics
The following metrics are available from the PVE API but not yet implemented:
Per-node detailed status (/nodes/{node}/status)
- Load averages (1m, 5m, 15m)
- Swap usage (total, used, free)
- Root filesystem usage (total, used, available)
- KSM shared memory
- Kernel version info
- Boot mode and secure boot status
- CPU model info (model, sockets, cores, MHz)
Per-VM pressure metrics (/nodes/{node}/qemu)
pressurecpusome, pressurecpufull
pressurememorysome, pressurememoryfull
pressureiosome, pressureiofull
HA detailed status (/cluster/ha/status/current)
- CRM master node and status
- Per-node LRM status (idle/active) and timestamps
- Per-service HA config (failback, max_restart, max_relocate)
Physical disks (/nodes/{node}/disks/list)
- Disk health (SMART status)
- Wearout level
- Size and model info
- OSD mapping
SDN/Network (/cluster/resources type=sdn)
- Zone status per node
- Zone type info