This is the first commit in a series of commits to improve the Code
Quality in Cloud Hypervisor in a sustainable way. These are the
default rules from `clippy::all` but written here to be more explicit.
`clippy::all` refers to all "default sensible" lints, not all
existing lints.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This better aligns with the rest of the code and makes it clearer
that these tests can run "as is" in a normal hosted environments
without the special test environment.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
TL;DR: Massive quality of life improvement for devs
Cloud Hypervisor uses the Cargo test framework for multiple tests:
- normal unit tests
- unit tests requiring special environment (the Tap device tests)
- integration tests requiring a special environment
This prevented the execution of `cargo test --workspace`, which results
in a very poor developer experience. Although
`./scripts/run_unit_tests.sh` exists, there are valid reasons why devs
cannot or even don't want to use it.
By adding a new `chv_testenv` rustc config, we can conditionally only
activate tests when the `./scripts/` magic runs them. This improves
the general developer experience by a lot.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Commit 5ec47d4883 was intended to patch ebx bits 23-16 in cpuid leaf
0x1, but it was not working as expected, as in rust, operator << has a
stronger precedence than & [1]. Later commit b6667f948e fixed the
operator precedence clippy warning, but did not fix the actual issue. As
a result, the current code is not changing ebx,
```
cpu_ebx |= ((dies_per_package as u32) * (cores_per_die as u32) * (threads_per_core as u32))
& (0xff << 16);
```
Since the total number of logical processors is generally less than
65536, the right hand side of the expression is 0 in most cases.
[1] https://doc.rust-lang.org/reference/expressions.html#expression-precedence
Fixes: 5ec47d4883 ("arch: x86_64: enable HTT flag")
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
A major improvement to the developer experience of clippy in
Cloud Hypervisor.
1. Make `cargo clippy` just work with the same lints we use in CI
2. Simplify adding new lints
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
... by just passing the iterator along. For large VMs this bitmap is
gigantic. A 12TB VM has 384MB of dirty bitmap.
With all these optimizations from the previous commits in place, we
see quite the improvement when it comes to scanning the dirty bitmap.
For a bitmap with 1% bits (randomly) set, dirty_log() takes:
Original code: 2166ms (100.0%)
New code: 382ms ( 17.6%)
on my system. The sparser the dirty bitmap the faster. Scanning an
empty bitmap is 100x faster. For a 5% populated bitmap we are still 3x
faster.
If someone wants to play with this, there is a benchmark harness here:
https://github.com/blitz/chv-bitmap-bench
On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
... by passing the slice along instead.
On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
This would be a good opportunity to optimize another pointless vector
away, but I don't have a good way to test this at the moment. But
maybe someone else gives it a shot.
On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
Adding itertools as dependency improves the iteration code in the
following significantly.
With this change, we don't need a copy of the vector. Just something
that can be coerced into an iterator. We also use the bit position
iterator to make the code somewhat clearer. The new code is much
faster, because it will not iterate over every bit, just each 1 bit in
the input.
The next commit will complete this optimization and have some concrete
numbers.
On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
Instead of using ad-hoc code, just write an extension to the Iterator
trait that we can easily unit test.
Co-authored-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP julian.stecklina@sap.com
On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
Due to the fw_cfg test being disabled on MSHV this results in no tests
being runnable which results in an error in nextest. Reduce that error
to a warning use --no-tests=warn
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
cargo nextest is an improved test runner that allows retries as well a
reporting the times for the test runs.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
When using nextest for running tests each test is run in its own process
so the old solution of using a static variable for the guest ID (used to
determine the network segment) no longer works.
Instead use a text file on the filesystem protected with an exclusive
lock. The test process will read from it and then write back the next ID
that can be used. It wraps around at the limit of u8 and skips ID 0.
This function intentionally panics rather than propagate errors as it
should only be called for testing purposes and there the panic handler
will give a useful backtrace and cleanup.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
This is required to support exclusive locking on files which is needed
for safe test ID generation when using nextest (since it runs each test
as a separate process.)
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
as of rust 1.90, writes to unix socket streams use send_with_flags
instead of write, so it uses a sendto syscall instead of write.
Signed-off-by: Matt Moriarity <matt@mattmoriarity.com>
To get that list, I've used
```
git log | grep --fixed-strings -- "-by:" | head -n 100000 | sort | less
```
on the Linux kernel's git repository.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
The set of extensions supported by a RISC-V system needs to be exposed
to the guest - currently that is a fixed, minimal set of extensions.
These extensions are not sufficient to boot Ubuntu 25.10 which now has a
mininimum requirement of RVA23S64 (which is a minimum set of extensions
that make sense for server use cases.)
The easiest way to convey the extensions that the guest should use is to
copy those that the host kernel understands (and thus includes in the
/proc/cpuinfo) data.
However since nested virtualisation is not currently possible - exclude
the "H" (Hypervisor) extension from the list of short (single letter)
extensions.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
After updating the Linux kernel to 6.19.6 the second segment (segment=1)
is now under the 2nd IOMMU group (which it a more logical setup) and as
such the added device which is on that segment is in that second IOMMU
group.
The same check is made in test_vdpa_block so also test there.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
Resignal every 10ms the thread if it has not acknowledged the signal via
setting the atomic when the vCPU thread was acknowledged. Further, avoid
an infinite loop by generating an error if it takes more than 1000ms to
interrupt the thread.
The retry helps mitigate a race condition where the signal is received
between checking the pause atomic and entering KVM_RUN ioctl when
pausing. Hitting this race condition would leave the
wait_untial_signal_acknowledged() method spinning indefinitely.
The timeout error avoids the VMM process being blocked indefinitely.
See: #7427
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
Dump kernel logs after running the tests in the MSHV workflow to help
debug failures.
In addition to getting the kernel logs using `dmesg` also use AzCli to
retrieve the serial console logs. If the VM is hung or panicked, the
workflow would be unable to SSH into it and execute `dmesg`. In this
case the serial console logs would be helpful.
Signed-off-by: Anirudh Rayabharam <anrayabh@microsoft.com>
Replace the use of `guestmount and `guestunmount` with standard tools
(losetup, mount) to modify cloud disk image. This eliminates
dependency on guestmount which requires /dev/kvm and is not available
in MSHV root partition.
Signed-off-by: Aastha Rawat <aastharawat@microsoft.com>
Refactor MSHV hypervisor selection logic to enable aarch64 support
instead of exiting. Use `--features mshv` to conditionally build &
test when hypervisor is mshv replacing hard exit for aarch64 on mshv.
Signed-off-by: Aastha Rawat <aastharawat@microsoft.com>
There are several copy-pasted code fragments in impl QcowFile. All of
them add L2 entry to the cache and one of them (in file_offset_write())
does also allocating new L2 entry if necessary.
Fold all these code fragments (except of one in l2_table() which does
error handling in special way) into cache_l2_cluster() method without
changing the logic.
This will make the code more compact and clean.
Signed-off-by: Eugene Korenevsky <ekorenevsky@aliyun.com>
With the current syntax docker gives 'docker: invalid reference format'
error. Also during parsing /xxx:/yyy in process_volumes_args
with \" inside variable i.e arr_vols=("${arg_vols//#/ }")
gives wrong output.
Example:
scripts/dev_cli.sh tests --integration --volumes /mshv:/mshv
Error: The volume /mshv /mshv does not exist.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
The API documentation tells users to expect a message when Cloud
Hypervisor is launched, this message was removed in commit 13724db
This change updates the documentation to reflect how the program
actually functions, which is to say no message.
Signed-off-by: Ariel Chenet <achenet@fastmail.com>
Consuming `&Arc<T>` as argument is almost always an antipattern as it
hides whether the callee is going to take over (shared) ownership
(by .clone()) or not. Instead, it is better to consume `&dyn T` or
`Arc<dyn T>` to be more explicit. This commit cleans up the code.
The change is very mechanic and was very easy to implement across the
code base.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com