The outer goroutine per-node acquired a semaphore slot and held it while
collectNode spawned inner goroutines needing slots from the same semaphore.
With maxConc=5 and 5+ nodes, all slots were consumed by outer goroutines,
inner goroutines blocked forever, and Collect() never returned — permanently
consuming an HTTP MaxRequestsInFlight slot until the server stopped responding.
Remove the redundant outer semaphore acquire (inner goroutines already manage
their own slots) and add a 120s HTTP timeout as defense-in-depth.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Convert HA API service IDs (vm:106, ct:200) to the resource ID format
used by /cluster/resources and the Python exporter (qemu/106, lxc/200).
Rename label from "sid" to "id" so HA metrics can be joined with
pve_ha_state, pve_guest_info, and other id-labeled metrics.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add metrics tables for node_status, vm_pressure, ha_status, and
physical_disk collectors. Remove the TODO section as all planned
metrics are now implemented (SDN excluded by design).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9 tasks covering node_status, vm_pressure, ha_status, and physical_disk
collectors with TDD approach, fixtures, and README update.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Split pve_ha_service_info into _config and _status to avoid stale series
- Handle wearout "N/A" and health "UNKNOWN" edge cases for physical disks
- Clarify node label convention and rootfs available vs free naming
- Note QEMU-only scope for VM pressure (LXC lacks PSI in PVE API)
- Add full node_status/lrm_status examples showing all cluster nodes
- Document mutex-guarded nodes pattern and test fixture requirements
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers node status, VM pressure, HA status, and physical disks
collectors with metric definitions, API structures, and scope
exclusions (SDN, kernel version, CPU model).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Package builds with buildGoModule and CGO_ENABLED=0
- Dev shell provides go_latest, gopls, gotools
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Accept tokens both with and without PVEAPIToken= prefix,
since token files may contain the full Authorization header value.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
14 tasks covering: Go module setup, API client, collector framework,
main entry point, and all 8 collectors (version, cluster_status,
corosync, cluster_resources, backup, subscription, node_config,
replication), plus README and integration testing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full design for a Go Prometheus exporter for Proxmox VE, replacing
the Python prometheus-pve-exporter with corosync metrics added.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>