Callers of get_host_address_range() rely on it returning a pointer to at
least size bytes of memory. mem.get_host_address() is an overrideable
method of a safe trait, so it is better for safe code to not rely on its
correctness for safety. Instead, use mem.get_slice(), which returns a
VolatileSlice whose invariants guarantee that it points to a sufficient
amount of memory. If mem.check_range() succeeds but mem.get_slice()
returns a slice that is too small, this means that there is either a
logic error or a situation the code cannot support yet, so panic.
Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
The kernel will validate that the size is page-aligned. The file offset
is always zero, so the kernel will also validate that the offset is
page-aligned.
Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
This API passes a u64 to a kernel API that treats the u64 as a userspace
address. Therefore, it should be marked unsafe, but it currently is not
[1]. Wrap the call in an unsafe block to document that invariants must
be upheld to avoid undefined behavior. This causes a compiler warning,
so suppress the warning with #[allow(unused_unsafe)].
[1]: https://github.com/rust-vmm/vfio/issues/100
Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
I believe that its only caller used it safely, but it is still better to
mark the code as unsafe. Also add additional validity checks.
Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
Also drop support for building the TDX code for 32-bit targets. All
CPUs with TDX support are 64-bit so supporting 32-bit targets is not
needed.
Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
To ensure that struct sizes are the same on 32-bit and 64-bit, various
kernel APIs use __u64 (Rust u64) to represent userspace pointers.
Userspace is expected to cast pointers to __u64 before passing them to
the kernel, and cast kernel-provided __u64 to a pointer before using
them. However, various safe APIs in Cloud Hypervisor took
caller-provided u64 values and passed them to syscalls that interpret
them as userspace addresses. Therefore, passing bad u64 values would
cause memory disclosure or corruption.
Fix the bug by using usize and pointer types as appropriate. To make
soundness of the code easier to reason about, the PCI code gains a new
MmapRegion abstraction that ensures the validity of pointers. The rest
of the code already has an MmapRegion abstraction it can use. To avoid
having to reason about whether something is keeping the MmapRegion
alive, reference counting is added. MmapRegion cannot hold references
to other objects, so the reference counting cannot introduce cycles.
Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
To ensure that struct sizes are the same on 32-bit and 64-bit, various
kernel APIs use __u64 (Rust u64) to represent userspace pointers.
Userspace is expected to cast pointers to __u64 before passing them to
the kernel, and cast kernel-provided __u64 to a pointer before using
them. However, various safe APIs in Cloud Hypervisor took
caller-provided u64 values and passed them to syscalls that treat them
as userspace addresses. Therefore, passing bad u64 values would cause
memory disclosure or corruption. The memory region APIs are one example
of this, so mark them as unsafe.
Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
When booting a Linux guest in SMP configuration, on sapphire rapids
and granite rapids the following kernel warning can be observed:
[Firmware Bug]: CPUID leaf 0x1f subleaf 1 APIC ID mismatch 1 != 0
[Firmware Bug]: CPUID leaf 0x1f subleaf 2 APIC ID mismatch 1 != 0
The reason is that we announce the presence of the extended topology
leaf, but fail to announce the x2apic ID in EDX for each subleaf.
Signed-off-by: Thomas Prescher <thomas.prescher@cyberus-technology.de>
On-behalf-of: SAP thomas.prescher@sap.com
as of rust 1.90, writes to unix sockets use the sendto syscall. This
affects the vcpu threads when vfio_user devices are accessed.
Signed-off-by: Stefan Nürnberger <stefan.nuernberger@cyberus-technology.de>
Unfortunately, there is no lint for that.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This is the first commit in a series of commits to improve the Code
Quality in Cloud Hypervisor in a sustainable way. These are the
default rules from `clippy::all` but written here to be more explicit.
`clippy::all` refers to all "default sensible" lints, not all
existing lints.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This better aligns with the rest of the code and makes it clearer
that these tests can run "as is" in a normal hosted environments
without the special test environment.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
TL;DR: Massive quality of life improvement for devs
Cloud Hypervisor uses the Cargo test framework for multiple tests:
- normal unit tests
- unit tests requiring special environment (the Tap device tests)
- integration tests requiring a special environment
This prevented the execution of `cargo test --workspace`, which results
in a very poor developer experience. Although
`./scripts/run_unit_tests.sh` exists, there are valid reasons why devs
cannot or even don't want to use it.
By adding a new `chv_testenv` rustc config, we can conditionally only
activate tests when the `./scripts/` magic runs them. This improves
the general developer experience by a lot.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Commit 5ec47d4883 was intended to patch ebx bits 23-16 in cpuid leaf
0x1, but it was not working as expected, as in rust, operator << has a
stronger precedence than & [1]. Later commit b6667f948e fixed the
operator precedence clippy warning, but did not fix the actual issue. As
a result, the current code is not changing ebx,
```
cpu_ebx |= ((dies_per_package as u32) * (cores_per_die as u32) * (threads_per_core as u32))
& (0xff << 16);
```
Since the total number of logical processors is generally less than
65536, the right hand side of the expression is 0 in most cases.
[1] https://doc.rust-lang.org/reference/expressions.html#expression-precedence
Fixes: 5ec47d4883 ("arch: x86_64: enable HTT flag")
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
A major improvement to the developer experience of clippy in
Cloud Hypervisor.
1. Make `cargo clippy` just work with the same lints we use in CI
2. Simplify adding new lints
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
... by just passing the iterator along. For large VMs this bitmap is
gigantic. A 12TB VM has 384MB of dirty bitmap.
With all these optimizations from the previous commits in place, we
see quite the improvement when it comes to scanning the dirty bitmap.
For a bitmap with 1% bits (randomly) set, dirty_log() takes:
Original code: 2166ms (100.0%)
New code: 382ms ( 17.6%)
on my system. The sparser the dirty bitmap the faster. Scanning an
empty bitmap is 100x faster. For a 5% populated bitmap we are still 3x
faster.
If someone wants to play with this, there is a benchmark harness here:
https://github.com/blitz/chv-bitmap-bench
On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
... by passing the slice along instead.
On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
This would be a good opportunity to optimize another pointless vector
away, but I don't have a good way to test this at the moment. But
maybe someone else gives it a shot.
On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
Adding itertools as dependency improves the iteration code in the
following significantly.
With this change, we don't need a copy of the vector. Just something
that can be coerced into an iterator. We also use the bit position
iterator to make the code somewhat clearer. The new code is much
faster, because it will not iterate over every bit, just each 1 bit in
the input.
The next commit will complete this optimization and have some concrete
numbers.
On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
Instead of using ad-hoc code, just write an extension to the Iterator
trait that we can easily unit test.
Co-authored-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP julian.stecklina@sap.com
On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
Due to the fw_cfg test being disabled on MSHV this results in no tests
being runnable which results in an error in nextest. Reduce that error
to a warning use --no-tests=warn
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
cargo nextest is an improved test runner that allows retries as well a
reporting the times for the test runs.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
When using nextest for running tests each test is run in its own process
so the old solution of using a static variable for the guest ID (used to
determine the network segment) no longer works.
Instead use a text file on the filesystem protected with an exclusive
lock. The test process will read from it and then write back the next ID
that can be used. It wraps around at the limit of u8 and skips ID 0.
This function intentionally panics rather than propagate errors as it
should only be called for testing purposes and there the panic handler
will give a useful backtrace and cleanup.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
This is required to support exclusive locking on files which is needed
for safe test ID generation when using nextest (since it runs each test
as a separate process.)
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
as of rust 1.90, writes to unix socket streams use send_with_flags
instead of write, so it uses a sendto syscall instead of write.
Signed-off-by: Matt Moriarity <matt@mattmoriarity.com>
To get that list, I've used
```
git log | grep --fixed-strings -- "-by:" | head -n 100000 | sort | less
```
on the Linux kernel's git repository.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
The set of extensions supported by a RISC-V system needs to be exposed
to the guest - currently that is a fixed, minimal set of extensions.
These extensions are not sufficient to boot Ubuntu 25.10 which now has a
mininimum requirement of RVA23S64 (which is a minimum set of extensions
that make sense for server use cases.)
The easiest way to convey the extensions that the guest should use is to
copy those that the host kernel understands (and thus includes in the
/proc/cpuinfo) data.
However since nested virtualisation is not currently possible - exclude
the "H" (Hypervisor) extension from the list of short (single letter)
extensions.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
After updating the Linux kernel to 6.19.6 the second segment (segment=1)
is now under the 2nd IOMMU group (which it a more logical setup) and as
such the added device which is on that segment is in that second IOMMU
group.
The same check is made in test_vdpa_block so also test there.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>