3bad7963af
fix: resolve deadlock in node_config collector causing request exhaustion
...
The outer goroutine per-node acquired a semaphore slot and held it while
collectNode spawned inner goroutines needing slots from the same semaphore.
With maxConc=5 and 5+ nodes, all slots were consumed by outer goroutines,
inner goroutines blocked forever, and Collect() never returned — permanently
consuming an HTTP MaxRequestsInFlight slot until the server stopped responding.
Remove the redundant outer semaphore acquire (inner goroutines already manage
their own slots) and add a 120s HTTP timeout as defense-in-depth.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 11:30:54 +00:00
5e066a5c4b
fix: normalize HA service IDs to match cluster_resources format
...
Convert HA API service IDs (vm:106, ct:200) to the resource ID format
used by /cluster/resources and the Python exporter (qemu/106, lxc/200).
Rename label from "sid" to "id" so HA metrics can be joined with
pve_ha_state, pve_guest_info, and other id-labeled metrics.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 15:12:01 +00:00
01dbc7cee4
Strip trailing slash from PVE host URLs
...
A trailing slash in --pve.host (e.g. https://host:8006/ ) caused API
requests to fail with status 500 due to double slashes in the path.
2026-03-23 11:34:26 +00:00
a88c696bfd
feat: add physical_disk collector (health, wearout, size, OSD mapping)
...
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 15:33:46 +00:00
0afa5b0e19
test: add physical_disk collector test and fixture
...
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 15:33:43 +00:00
6244100886
feat: add ha_status collector (CRM master, node/LRM status, service config)
...
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 15:30:42 +00:00
16cfba4587
test: add ha_status collector test and fixtures
...
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 15:30:39 +00:00
d458894b0e
feat: add vm_pressure collector (PSI cpu/memory/io for QEMU VMs)
...
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 15:28:22 +00:00
1e4e3af1d5
test: add vm_pressure collector test and fixture
...
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 15:28:17 +00:00
496a46460c
feat: add node_status collector (load, swap, rootfs, ksm, boot mode)
...
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 15:23:09 +00:00
2097451d15
test: add node_status collector test and fixture
...
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 15:23:04 +00:00
2bdb508672
fix: normalize API token format in client
...
Accept tokens both with and without PVEAPIToken= prefix,
since token files may contain the full Authorization header value.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:40:07 +00:00
3bafb67aa0
feat: add replication collector (6 replication metrics)
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:36:54 +00:00
b59abd59d3
feat: add node_config collector (pve_onboot_status)
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:36:17 +00:00
7708a64408
feat: add subscription collector (info, status, next_due_timestamp)
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:35:36 +00:00
5e61f224c4
feat: add backup collector (pve_not_backed_up_total, pve_not_backed_up_info)
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:33:33 +00:00
a62264edf8
feat: add cluster_resources collector (16 metrics)
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:33:03 +00:00
2a51e00fe1
feat: add corosync collector (quorate, nodes_total, expected_votes, node_online)
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:32:00 +00:00
63494d0fcb
feat: add cluster_status collector (pve_node_info, pve_cluster_info)
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:31:31 +00:00
c8ae97d777
feat: add version collector (pve_version_info)
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:31:06 +00:00
af71e7d729
feat: add collector framework with registry and parallel scrape orchestration
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:26:55 +00:00
210e22e030
feat: add PVE API client with multi-host failover
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:25:48 +00:00