# pve-exporter A Prometheus exporter for Proxmox VE written in Go. Produces a single static binary for easy deployment. Designed as a drop-in replacement for [prometheus-pve-exporter](https://github.com/prometheus-community/prometheus-pve-exporter) with matching metric names for dashboard compatibility, plus additional corosync cluster metrics. ## Installation ```bash CGO_ENABLED=0 go build -o pve-exporter . ``` ## Usage ```bash pve-exporter \ --pve.host=https://node01:8006 \ --pve.host=https://node02:8006 \ --pve.token-file=/etc/pve-exporter/apikey \ --web.listen-address=:9221 ``` The exporter scrapes all cluster data from a single PVE API endpoint. Multiple `--pve.host` values provide failover — hosts are tried in order, with a 1-second connect timeout for fast failover. ### Flags | Flag | Default | Description | |------|---------|-------------| | `--pve.host` | (required) | PVE API base URL (repeatable) | | `--pve.api-token` | | API token string (`user@realm!tokenid=uuid`) | | `--pve.token-file` | | Path to file containing API token | | `--pve.tls-insecure` | `false` | Disable TLS certificate verification | | `--pve.max-concurrent` | `5` | Max concurrent API requests for per-node fan-out | | `--web.listen-address` | `:9221` | Address to listen on | | `--web.telemetry-path` | `/metrics` | Path for metrics endpoint | | `--log.level` | `info` | Log level (debug, info, warn, error) | | `--log.format` | `logfmt` | Log format (logfmt, json) | ### Authentication Create a PVE API token with at least `PVEAuditor` role. Provide it via: - `--pve.api-token=user@realm!tokenid=uuid` (visible in process list) - `--pve.token-file=/path/to/file` (recommended) - `PVE_API_TOKEN` environment variable `--pve.api-token` and `--pve.token-file` are mutually exclusive. ## Metrics ### Cluster Status | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `pve_node_info` | Gauge | `id`, `level`, `name`, `nodeid` | Node info (always 1) | | `pve_cluster_info` | Gauge | `id`, `nodes`, `quorate`, `version` | Cluster info (always 1) | ### Corosync | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `pve_cluster_quorate` | Gauge | | 1 if cluster has quorum | | `pve_cluster_nodes_total` | Gauge | | Total node count | | `pve_cluster_expected_votes` | Gauge | | Sum of quorum votes from config | | `pve_node_online` | Gauge | `name`, `nodeid` | 1 if node is online | ### Cluster Resources | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `pve_up` | Gauge | `id` | 1 if node/VM/CT is online/running | | `pve_cpu_usage_ratio` | Gauge | `id` | CPU utilization ratio | | `pve_cpu_usage_limit` | Gauge | `id` | Number of available CPUs | | `pve_memory_usage_bytes` | Gauge | `id` | Used memory in bytes | | `pve_memory_size_bytes` | Gauge | `id` | Total memory in bytes | | `pve_disk_usage_bytes` | Gauge | `id` | Used disk space in bytes | | `pve_disk_size_bytes` | Gauge | `id` | Total disk space in bytes | | `pve_uptime_seconds` | Gauge | `id` | Uptime in seconds | | `pve_network_transmit_bytes_total` | Counter | `id` | Network bytes sent | | `pve_network_receive_bytes_total` | Counter | `id` | Network bytes received | | `pve_disk_written_bytes_total` | Counter | `id` | Disk bytes written | | `pve_disk_read_bytes_total` | Counter | `id` | Disk bytes read | | `pve_guest_info` | Gauge | `id`, `node`, `name`, `type`, `template`, `tags` | VM/CT info (always 1) | | `pve_storage_info` | Gauge | `id`, `node`, `storage`, `plugintype`, `content` | Storage info (always 1) | | `pve_storage_shared` | Gauge | `id` | 1 if storage is shared | | `pve_ha_state` | Gauge | `id`, `state` | HA service status | | `pve_lock_state` | Gauge | `id`, `state` | Guest config lock state | ### Version | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `pve_version_info` | Gauge | `release`, `repoid`, `version` | PVE version info (always 1) | ### Backup | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `pve_not_backed_up_total` | Gauge | `id` | 1 if guest has no backup job | | `pve_not_backed_up_info` | Gauge | `id` | 1 if guest has no backup job | ### Node Config | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `pve_onboot_status` | Gauge | `id`, `node`, `type` | VM/CT onboot config value | ### Replication | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `pve_replication_info` | Gauge | `id`, `type`, `source`, `target`, `guest` | Replication job info (always 1) | | `pve_replication_duration_seconds` | Gauge | `id` | Last replication duration | | `pve_replication_last_sync_timestamp_seconds` | Gauge | `id` | Last successful sync time | | `pve_replication_last_try_timestamp_seconds` | Gauge | `id` | Last sync attempt time | | `pve_replication_next_sync_timestamp_seconds` | Gauge | `id` | Next scheduled sync time | | `pve_replication_failed_syncs` | Gauge | `id` | Failed sync count | ### Subscription | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `pve_subscription_info` | Gauge | `id`, `level` | Subscription info (always 1) | | `pve_subscription_status` | Gauge | `id`, `status` | Subscription status | | `pve_subscription_next_due_timestamp_seconds` | Gauge | `id` | Next due date as Unix timestamp | ### Node Status | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `pve_node_load1` | Gauge | `node` | 1-minute load average | | `pve_node_load5` | Gauge | `node` | 5-minute load average | | `pve_node_load15` | Gauge | `node` | 15-minute load average | | `pve_node_swap_total_bytes` | Gauge | `node` | Total swap in bytes | | `pve_node_swap_used_bytes` | Gauge | `node` | Used swap in bytes | | `pve_node_swap_free_bytes` | Gauge | `node` | Free swap in bytes | | `pve_node_rootfs_total_bytes` | Gauge | `node` | Root filesystem total in bytes | | `pve_node_rootfs_used_bytes` | Gauge | `node` | Root filesystem used in bytes | | `pve_node_rootfs_available_bytes` | Gauge | `node` | Root filesystem available in bytes | | `pve_node_ksm_shared_bytes` | Gauge | `node` | KSM shared memory in bytes | | `pve_node_boot_mode_info` | Gauge | `node`, `mode`, `secureboot` | Boot mode info (always 1) | ### VM Pressure | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `pve_vm_pressure_cpu_some_ratio` | Gauge | `id`, `node` | CPU pressure (some) | | `pve_vm_pressure_cpu_full_ratio` | Gauge | `id`, `node` | CPU pressure (full) | | `pve_vm_pressure_memory_some_ratio` | Gauge | `id`, `node` | Memory pressure (some) | | `pve_vm_pressure_memory_full_ratio` | Gauge | `id`, `node` | Memory pressure (full) | | `pve_vm_pressure_io_some_ratio` | Gauge | `id`, `node` | I/O pressure (some) | | `pve_vm_pressure_io_full_ratio` | Gauge | `id`, `node` | I/O pressure (full) | ### HA Status | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `pve_ha_crm_master` | Gauge | `node` | 1 if node is CRM master, 0 otherwise | | `pve_ha_node_status` | Gauge | `node`, `status` | Per-node HA status (always 1) | | `pve_ha_lrm_timestamp_seconds` | Gauge | `node` | Last LRM heartbeat as Unix timestamp | | `pve_ha_lrm_mode` | Gauge | `node`, `mode` | LRM mode per node (always 1) | | `pve_ha_service_config` | Gauge | `id`, `type`, `max_restart`, `max_relocate`, `failback` | Service config (always 1) | | `pve_ha_service_status` | Gauge | `id`, `node`, `state` | Service runtime state (always 1) | ### Physical Disks | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `pve_physical_disk_health` | Gauge | `node`, `devpath`, `model`, `serial`, `type` | 1 if SMART PASSED, 0 otherwise | | `pve_physical_disk_wearout_remaining_ratio` | Gauge | `node`, `devpath` | Wearout remaining (1.0 = new) | | `pve_physical_disk_size_bytes` | Gauge | `node`, `devpath` | Disk size in bytes | | `pve_physical_disk_info` | Gauge | `node`, `devpath`, `model`, `serial`, `type`, `used` | Disk info (always 1) | | `pve_physical_disk_osd` | Gauge | `node`, `devpath`, `osd` | Disk-to-OSD mapping (always 1) | ### Scrape Meta | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `pve_scrape_collector_duration_seconds` | Gauge | `collector` | Scrape duration per collector | | `pve_scrape_collector_success` | Gauge | `collector` | 1 if collector succeeded |