Proxmox VE prometheus exporter
Find a file
2026-03-20 15:33:46 +00:00
collector feat: add physical_disk collector (health, wearout, size, OSD mapping) 2026-03-20 15:33:46 +00:00
docs/superpowers docs: add implementation plan for remaining collectors 2026-03-20 15:20:17 +00:00
.gitignore Add flake.nix for Nix builds and dev shell 2026-03-20 12:44:50 +00:00
flake.lock Add flake.nix for Nix builds and dev shell 2026-03-20 12:44:50 +00:00
flake.nix Add flake.nix for Nix builds and dev shell 2026-03-20 12:44:50 +00:00
go.mod feat: add version collector (pve_version_info) 2026-03-20 11:31:06 +00:00
go.sum feat: add main entry point with CLI flags and HTTP server 2026-03-20 11:27:53 +00:00
main.go feat: add main entry point with CLI flags and HTTP server 2026-03-20 11:27:53 +00:00
Makefile feat: add main entry point with CLI flags and HTTP server 2026-03-20 11:27:53 +00:00
README.md docs: add README with usage, metrics reference, and future metrics TODO 2026-03-20 11:38:17 +00:00

pve-exporter

A Prometheus exporter for Proxmox VE written in Go. Produces a single static binary for easy deployment.

Designed as a drop-in replacement for prometheus-pve-exporter with matching metric names for dashboard compatibility, plus additional corosync cluster metrics.

Installation

CGO_ENABLED=0 go build -o pve-exporter .

Usage

pve-exporter \
  --pve.host=https://node01:8006 \
  --pve.host=https://node02:8006 \
  --pve.token-file=/etc/pve-exporter/apikey \
  --web.listen-address=:9221

The exporter scrapes all cluster data from a single PVE API endpoint. Multiple --pve.host values provide failover — hosts are tried in order, with a 1-second connect timeout for fast failover.

Flags

Flag Default Description
--pve.host (required) PVE API base URL (repeatable)
--pve.api-token API token string (user@realm!tokenid=uuid)
--pve.token-file Path to file containing API token
--pve.tls-insecure false Disable TLS certificate verification
--pve.max-concurrent 5 Max concurrent API requests for per-node fan-out
--web.listen-address :9221 Address to listen on
--web.telemetry-path /metrics Path for metrics endpoint
--log.level info Log level (debug, info, warn, error)
--log.format logfmt Log format (logfmt, json)

Authentication

Create a PVE API token with at least PVEAuditor role. Provide it via:

  • --pve.api-token=user@realm!tokenid=uuid (visible in process list)
  • --pve.token-file=/path/to/file (recommended)
  • PVE_API_TOKEN environment variable

--pve.api-token and --pve.token-file are mutually exclusive.

Metrics

Cluster Status

Metric Type Labels Description
pve_node_info Gauge id, level, name, nodeid Node info (always 1)
pve_cluster_info Gauge id, nodes, quorate, version Cluster info (always 1)

Corosync

Metric Type Labels Description
pve_cluster_quorate Gauge 1 if cluster has quorum
pve_cluster_nodes_total Gauge Total node count
pve_cluster_expected_votes Gauge Sum of quorum votes from config
pve_node_online Gauge name, nodeid 1 if node is online

Cluster Resources

Metric Type Labels Description
pve_up Gauge id 1 if node/VM/CT is online/running
pve_cpu_usage_ratio Gauge id CPU utilization ratio
pve_cpu_usage_limit Gauge id Number of available CPUs
pve_memory_usage_bytes Gauge id Used memory in bytes
pve_memory_size_bytes Gauge id Total memory in bytes
pve_disk_usage_bytes Gauge id Used disk space in bytes
pve_disk_size_bytes Gauge id Total disk space in bytes
pve_uptime_seconds Gauge id Uptime in seconds
pve_network_transmit_bytes_total Counter id Network bytes sent
pve_network_receive_bytes_total Counter id Network bytes received
pve_disk_written_bytes_total Counter id Disk bytes written
pve_disk_read_bytes_total Counter id Disk bytes read
pve_guest_info Gauge id, node, name, type, template, tags VM/CT info (always 1)
pve_storage_info Gauge id, node, storage, plugintype, content Storage info (always 1)
pve_storage_shared Gauge id 1 if storage is shared
pve_ha_state Gauge id, state HA service status
pve_lock_state Gauge id, state Guest config lock state

Version

Metric Type Labels Description
pve_version_info Gauge release, repoid, version PVE version info (always 1)

Backup

Metric Type Labels Description
pve_not_backed_up_total Gauge id 1 if guest has no backup job
pve_not_backed_up_info Gauge id 1 if guest has no backup job

Node Config

Metric Type Labels Description
pve_onboot_status Gauge id, node, type VM/CT onboot config value

Replication

Metric Type Labels Description
pve_replication_info Gauge id, type, source, target, guest Replication job info (always 1)
pve_replication_duration_seconds Gauge id Last replication duration
pve_replication_last_sync_timestamp_seconds Gauge id Last successful sync time
pve_replication_last_try_timestamp_seconds Gauge id Last sync attempt time
pve_replication_next_sync_timestamp_seconds Gauge id Next scheduled sync time
pve_replication_failed_syncs Gauge id Failed sync count

Subscription

Metric Type Labels Description
pve_subscription_info Gauge id, level Subscription info (always 1)
pve_subscription_status Gauge id, status Subscription status
pve_subscription_next_due_timestamp_seconds Gauge id Next due date as Unix timestamp

Scrape Meta

Metric Type Labels Description
pve_scrape_collector_duration_seconds Gauge collector Scrape duration per collector
pve_scrape_collector_success Gauge collector 1 if collector succeeded

TODO: Future Metrics

The following metrics are available from the PVE API but not yet implemented:

Per-node detailed status (/nodes/{node}/status)

  • Load averages (1m, 5m, 15m)
  • Swap usage (total, used, free)
  • Root filesystem usage (total, used, available)
  • KSM shared memory
  • Kernel version info
  • Boot mode and secure boot status
  • CPU model info (model, sockets, cores, MHz)

Per-VM pressure metrics (/nodes/{node}/qemu)

  • pressurecpusome, pressurecpufull
  • pressurememorysome, pressurememoryfull
  • pressureiosome, pressureiofull

HA detailed status (/cluster/ha/status/current)

  • CRM master node and status
  • Per-node LRM status (idle/active) and timestamps
  • Per-service HA config (failback, max_restart, max_relocate)

Physical disks (/nodes/{node}/disks/list)

  • Disk health (SMART status)
  • Wearout level
  • Size and model info
  • OSD mapping

SDN/Network (/cluster/resources type=sdn)

  • Zone status per node
  • Zone type info