Commit graph

9231 commits

Author SHA1 Message Date
Demi Marie Obenour
021f450cdb virtio-devices: proper bounds checks
Callers of get_host_address_range() rely on it returning a pointer to at
least size bytes of memory.  mem.get_host_address() is an overrideable
method of a safe trait, so it is better for safe code to not rely on its
correctness for safety.  Instead, use mem.get_slice(), which returns a
VolatileSlice whose invariants guarantee that it points to a sufficient
amount of memory.  If mem.check_range() succeeds but mem.get_slice()
returns a slice that is too small, this means that there is either a
logic error or a situation the code cannot support yet, so panic.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
2025-11-22 10:24:13 +00:00
Demi Marie Obenour
0e21b56aea pci: do not check for page-aligned size and offset before calling mmap()
The kernel will validate that the size is page-aligned.  The file offset
is always zero, so the kernel will also validate that the offset is
page-aligned.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
2025-11-22 10:24:13 +00:00
Demi Marie Obenour
8be28f8438 misc: Work around vfio_dma_map being unsound
This API passes a u64 to a kernel API that treats the u64 as a userspace
address.  Therefore, it should be marked unsafe, but it currently is not
[1].  Wrap the call in an unsafe block to document that invariants must
be upheld to avoid undefined behavior.  This causes a compiler warning,
so suppress the warning with #[allow(unused_unsafe)].

[1]: https://github.com/rust-vmm/vfio/issues/100

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
2025-11-22 10:24:13 +00:00
Demi Marie Obenour
12c7cc5e4f pci: Remove dma_map() and dma_unmap()
These APIs had no users, were not documented, and were unsound.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
2025-11-22 10:24:13 +00:00
Demi Marie Obenour
8f6a6a85e0 virtio-devices: mark Vdpa::dma_map as unsafe
I believe that its only caller used it safely, but it is still better to
mark the code as unsafe.  Also add additional validity checks.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
2025-11-22 10:24:13 +00:00
Demi Marie Obenour
06b76972e2 pci: move operation out of loop
No functional change intended.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
2025-11-22 10:24:13 +00:00
Demi Marie Obenour
199d2d05d8 hypervisor: tdx: do not use u64 to represent pointers
Also drop support for building the TDX code for 32-bit targets.  All
CPUs with TDX support are 64-bit so supporting 32-bit targets is not
needed.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
2025-11-22 10:24:13 +00:00
Demi Marie Obenour
42522a88c0 misc: do not use u64 to represent host pointers
To ensure that struct sizes are the same on 32-bit and 64-bit, various
kernel APIs use __u64 (Rust u64) to represent userspace pointers.
Userspace is expected to cast pointers to __u64 before passing them to
the kernel, and cast kernel-provided __u64 to a pointer before using
them.  However, various safe APIs in Cloud Hypervisor took
caller-provided u64 values and passed them to syscalls that interpret
them as userspace addresses.  Therefore, passing bad u64 values would
cause memory disclosure or corruption.

Fix the bug by using usize and pointer types as appropriate.  To make
soundness of the code easier to reason about, the PCI code gains a new
MmapRegion abstraction that ensures the validity of pointers.  The rest
of the code already has an MmapRegion abstraction it can use.  To avoid
having to reason about whether something is keeping the MmapRegion
alive, reference counting is added.  MmapRegion cannot hold references
to other objects, so the reference counting cannot introduce cycles.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
2025-11-22 10:24:13 +00:00
Demi Marie Obenour
fdc19ad85e misc: Mark memory region APIs as unsafe
To ensure that struct sizes are the same on 32-bit and 64-bit, various
kernel APIs use __u64 (Rust u64) to represent userspace pointers.
Userspace is expected to cast pointers to __u64 before passing them to
the kernel, and cast kernel-provided __u64 to a pointer before using
them.  However, various safe APIs in Cloud Hypervisor took
caller-provided u64 values and passed them to syscalls that treat them
as userspace addresses.  Therefore, passing bad u64 values would cause
memory disclosure or corruption.  The memory region APIs are one example
of this, so mark them as unsafe.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
2025-11-22 10:24:13 +00:00
Demi Marie Obenour
00f0b9e42c vmm: Fix clippy lints on RISC-V
These caused CI failures in #7129.

No functional change.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
2025-11-22 10:24:13 +00:00
dependabot[bot]
0ff8d1cb28 build: Bump actions/checkout from 5 to 6
Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v5...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-11-22 00:05:53 +00:00
Thomas Prescher
b6032bc492 arch: fix extended topology enumeration subleafs
When booting a Linux guest in SMP configuration, on sapphire rapids
and granite rapids the following kernel warning can be observed:

[Firmware Bug]: CPUID leaf 0x1f subleaf 1 APIC ID mismatch 1 != 0
[Firmware Bug]: CPUID leaf 0x1f subleaf 2 APIC ID mismatch 1 != 0

The reason is that we announce the presence of the extended topology
leaf, but fail to announce the x2apic ID in EDX for each subleaf.

Signed-off-by: Thomas Prescher <thomas.prescher@cyberus-technology.de>
On-behalf-of: SAP thomas.prescher@sap.com
2025-11-21 17:17:27 +00:00
Stefan Nürnberger
95b8c6afdd seccomp: allow sendto for vfio_user devices
as of rust 1.90, writes to unix sockets use the sendto syscall. This
affects the vcpu threads when vfio_user devices are accessed.

Signed-off-by: Stefan Nürnberger <stefan.nuernberger@cyberus-technology.de>
2025-11-21 17:06:19 +00:00
Philipp Schuster
f02745a7ed vmm: unrelated small code improvements
Unfortunately, there is no lint for that.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-21 09:32:11 +00:00
Philipp Schuster
fed010fcd1 misc: clippy: add manual_string_new
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-21 09:32:11 +00:00
Philipp Schuster
b4c62bf159 misc: clippy: add semicolon_if_nothing_returned
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-21 09:32:11 +00:00
Philipp Schuster
ea4f07d3bf misc: clippy: add uninlined_format_args
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-21 09:32:11 +00:00
Philipp Schuster
7cb73e9e56 misc: clippy: add unnecessary_semicolon
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-21 09:32:11 +00:00
Philipp Schuster
06390342a6 misc: clippy: add default clippy lint groups
This is the first commit in a series of commits to improve the Code
Quality in Cloud Hypervisor in a sustainable way. These are the
default rules from `clippy::all` but written here to be more explicit.
`clippy::all` refers to all "default sensible" lints, not all
existing lints.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-21 09:32:11 +00:00
Philipp Schuster
11d17fbf79 docs: update CONTRIBUTING.md
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-20 21:15:03 +00:00
Philipp Schuster
e160a17131 net_util: unrelated code improvement
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-20 21:15:03 +00:00
Philipp Schuster
7364fbdc8e tests: move VM test into a test module
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-20 21:15:03 +00:00
Philipp Schuster
35b91f76af tests: prevent broken terminal after running cargo test -p vmm
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-20 21:15:03 +00:00
Philipp Schuster
d1680b9ff9 tests: streamline module names to unit_tests
This better aligns with the rest of the code and makes it clearer
that these tests can run "as is" in a normal hosted environments
without the special test environment.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-20 21:15:03 +00:00
Philipp Schuster
c990f1bdaa tests: enable cargo test --workspace + #[cfg(devcli_testenv)]
TL;DR: Massive quality of life improvement for devs

Cloud Hypervisor uses the Cargo test framework for multiple tests:

- normal unit tests
- unit tests requiring special environment (the Tap device tests)
- integration tests requiring a special environment

This prevented the execution of `cargo test --workspace`, which results
in a very poor developer experience. Although
`./scripts/run_unit_tests.sh` exists, there are valid reasons why devs
cannot or even don't want to use it.

By adding a new `chv_testenv` rustc config, we can conditionally only
activate tests when the `./scripts/` magic runs them. This improves
the general developer experience by a lot.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-20 21:15:03 +00:00
Bo Chen
e3e9e1c84c tests: Add retries for artifact downloads
This change makes integration tests more resilient to transient download
failures. This will reduce churn in our CI workflow, specifically for
the Merge Queue.

Examples:
https://github.com/cloud-hypervisor/cloud-hypervisor/actions/runs/19545345066/job/55962570896
https://github.com/cloud-hypervisor/cloud-hypervisor/actions/runs/19545345122/job/55962570736
https://github.com/cloud-hypervisor/cloud-hypervisor/actions/runs/19545345034/job/55962570724

Signed-off-by: Bo Chen <bchen@crusoe.ai>
2025-11-20 19:34:57 +00:00
Changyuan Lyu
be495ec64a arch: x86_64: fix cpuid leaf 0x1 EBX bits 23-16
Commit 5ec47d4883 was intended to patch ebx bits 23-16 in cpuid leaf
0x1, but it was not working as expected, as in rust, operator << has a
stronger precedence than & [1]. Later commit b6667f948e fixed the
operator precedence clippy warning, but did not fix the actual issue. As
a result, the current code is not changing ebx,

```
cpu_ebx |= ((dies_per_package as u32) * (cores_per_die as u32) * (threads_per_core as u32))
    & (0xff << 16);
```

Since the total number of logical processors is generally less than
65536, the right hand side of the expression is 0 in most cases.

[1] https://doc.rust-lang.org/reference/expressions.html#expression-precedence

Fixes: 5ec47d4883 ("arch: x86_64: enable HTT flag")
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2025-11-18 08:27:59 +00:00
dependabot[bot]
8ee26286ac build: Bump crate-ci/typos from 1.39.0 to 1.39.2
Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.39.0 to 1.39.2.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](https://github.com/crate-ci/typos/compare/v1.39.0...v1.39.2)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.39.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-11-18 00:27:28 +00:00
Philipp Schuster
935332bc42 misc: unrelated misc code improvements
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-17 16:59:49 +00:00
Philipp Schuster
e4fd066d82 misc: improve developer experience of cargo clippy
A major improvement to the developer experience of clippy in
Cloud Hypervisor.

1. Make `cargo clippy` just work with the same lints we use in CI
2. Simplify adding new lints

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-17 16:59:49 +00:00
Philipp Schuster
a7fa3a0c86 vm-migration: better naming + unittests
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-17 14:34:54 +00:00
Julian Stecklina
b6c266c880 vmm: avoid creating large temporary vector during migration
... by just passing the iterator along. For large VMs this bitmap is
gigantic. A 12TB VM has 384MB of dirty bitmap.

With all these optimizations from the previous commits in place, we
see quite the improvement when it comes to scanning the dirty bitmap.

For a bitmap with 1% bits (randomly) set, dirty_log() takes:

Original code: 2166ms (100.0%)
New code:       382ms ( 17.6%)

on my system. The sparser the dirty bitmap the faster. Scanning an
empty bitmap is 100x faster. For a 5% populated bitmap we are still 3x
faster.

If someone wants to play with this, there is a benchmark harness here:
https://github.com/blitz/chv-bitmap-bench

On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
2025-11-17 14:34:54 +00:00
Julian Stecklina
fc99e299c3 virtio-devices: avoid creating a temporary vector
... by passing the slice along instead.

On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
2025-11-17 14:34:54 +00:00
Julian Stecklina
3d5f9a3a98 virtio-devices: mark a possible improvement
This would be a good opportunity to optimize another pointless vector
away, but I don't have a good way to test this at the moment. But
maybe someone else gives it a shot.

On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
2025-11-17 14:34:54 +00:00
Julian Stecklina
ad9034ed1d vm-migration: optimize dirty bitmap scanning
Adding itertools as dependency improves the iteration code in the
following significantly.

With this change, we don't need a copy of the vector. Just something
that can be coerced into an iterator. We also use the bit position
iterator to make the code somewhat clearer. The new code is much
faster, because it will not iterate over every bit, just each 1 bit in
the input.

The next commit will complete this optimization and have some concrete
numbers.

On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
2025-11-17 14:34:54 +00:00
Julian Stecklina
96f4e33897 vm-migration: add helper to iterate over bitmaps
Instead of using ad-hoc code, just write an extension to the Iterator
trait that we can easily unit test.

Co-authored-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP julian.stecklina@sap.com
On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
2025-11-17 14:34:54 +00:00
Rob Bradford
6147c4c8b7 tests: Disable nextest failure on fw_cfg tests on MSHV
Due to the fw_cfg test being disabled on MSHV this results in no tests
being runnable which results in an error in nextest. Reduce that error
to a warning use --no-tests=warn

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2025-11-17 10:22:34 +00:00
Rob Bradford
f9076cccfa tests: Add test timeout (10 minutes) to nextest configuration
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2025-11-17 10:22:34 +00:00
Rob Bradford
9f046f02a2 tests: Disable test_snapshot_restore_with_fd()
This is now failing on x86-64 as well after the update of the Rust
version.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2025-11-17 10:22:34 +00:00
Rob Bradford
3734a13cbf tests: Use cargo nextest for integration tests
This alternative test runner supports retries and also reports how long
each test takes to run.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2025-11-17 10:22:34 +00:00
Rob Bradford
2707b0f72a build: Add cargo nextest to container
cargo nextest is an improved test runner that allows retries as well a
reporting the times for the test runs.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2025-11-17 10:22:34 +00:00
Rob Bradford
176023156e test_infra: Make guest ID generation multiprocess safe
When using nextest for running tests each test is run in its own process
so the old solution of using a static variable for the guest ID (used to
determine the network segment) no longer works.

Instead use a text file on the filesystem protected with an exclusive
lock. The test process will read from it and then write back the next ID
that can be used. It wraps around at the limit of u8 and skips ID 0.

This function intentionally panics rather than propagate errors as it
should only be called for testing purposes and there the panic handler
will give a useful backtrace and cleanup.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2025-11-17 10:22:34 +00:00
Rob Bradford
5051feb0bd build: Bump MSRV to 1.89.0
This is required to support exclusive locking on files which is needed
for safe test ID generation when using nextest (since it runs each test
as a separate process.)

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2025-11-17 10:22:34 +00:00
Matt Moriarity
ec57aade15 seccomp: allow sendto for vsock thread
as of rust 1.90, writes to unix socket streams use send_with_flags
instead of write, so it uses a sendto syscall instead of write.

Signed-off-by: Matt Moriarity <matt@mattmoriarity.com>
2025-11-13 18:47:01 +00:00
Philipp Schuster
02da2f2d36 misc: gitlint: python code improvements
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-13 19:16:45 +00:00
Philipp Schuster
62345cd6fd misc: gitlint: allow more prefixes disabling 72 width limit
Suggested-by Alyssa Ross [0].

[0]: https://github.com/cloud-hypervisor/cloud-hypervisor/pull/7471#discussion_r2519894440

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-13 19:16:45 +00:00
Philipp Schuster
f826d92601 misc: gitlint: allow well-known commit tags to exceed line limit
To get that list, I've used

```
git log | grep --fixed-strings -- "-by:" | head -n 100000 | sort | less
```

on the Linux kernel's git repository.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-13 19:16:45 +00:00
Rob Bradford
063aa4b7d5 arch: riscv64: Expose host extension set to guest via FDT
The set of extensions supported by a RISC-V system needs to be exposed
to the guest - currently that is a fixed, minimal set of extensions.
These extensions are not sufficient to boot Ubuntu 25.10 which now has a
mininimum requirement of RVA23S64 (which is a minimum set of extensions
that make sense for server use cases.)

The easiest way to convey the extensions that the guest should use is to
copy those that the host kernel understands (and thus includes in the
/proc/cpuinfo) data.

However since nested virtualisation is not currently possible - exclude
the "H" (Hypervisor) extension from the list of short (single letter)
extensions.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2025-11-13 14:34:38 +00:00
Rob Bradford
d586c844de tests: Update test_iommu_segments check
After updating the Linux kernel to 6.19.6 the second segment (segment=1)
is now under the 2nd IOMMU group (which it a more logical setup) and as
such the added device which is on that segment is in that second IOMMU
group.

The same check is made in test_vdpa_block so also test there.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2025-11-13 14:29:26 +00:00
Rob Bradford
b3b51bd3a2 tests: aarch64: Fix test_virtio_iommu for kernel IOMMU groups change
The numbers for the IOMMU groups have shifted after the update to Linux
kernel 6.16.9.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2025-11-13 14:29:26 +00:00