Commit graph

900 commits

Author SHA1 Message Date
Songqian Li
bd17c84d3c virtio-devices: move userspace mapping to vm-device
Move UserspaceMapping to vm-device to avoid redefinition since
UserspaceMapping is used by both `virtio-devices` and `device`
crate.

Signed-off-by: Songqian Li <sionli@tencent.com>
2025-08-14 22:14:34 +00:00
Songqian Li
cd2c43b489 misc: Fix beta clippy errors
Fix clippy error: "error: manual implementation of `.is_multiple_of()
`" from rustc 1.90.0-beta.1 (788da80fc 2025-08-04).

Signed-off-by: Songqian Li <sionli@tencent.com>
2025-08-07 16:53:59 +00:00
Oliver Anderson
8c136041cb build: Use workspace dependencies
Many of the workspace members in the Cloud-hypervisor workspace share
common dependencies. Making these workspace dependencies reduces
duplication and improves maintainability.

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
2025-07-28 20:19:27 +00:00
dependabot[bot]
b0bf889d58 build: Bump serde_with from 3.9.0 to 3.14.0
Bumps [serde_with](https://github.com/jonasbb/serde_with) from 3.9.0 to 3.14.0.
- [Release notes](https://github.com/jonasbb/serde_with/releases)
- [Commits](https://github.com/jonasbb/serde_with/compare/v3.9.0...v3.14.0)

---
updated-dependencies:
- dependency-name: serde_with
  dependency-version: 3.14.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-07-26 00:03:54 +00:00
Songqian Li
e32fa593e5 build: clean up unused dependencies
Signed-off-by: Songqian Li <sionli@tencent.com>
2025-07-15 07:16:36 +00:00
Alyssa Ross
ec8fceb4a6 virtio-devices: stop corrupting vsock commands
The read_exact() call was introduced in 82ac114b8 ("virtio-devices:
vsock: handle short read in muxer") to solve a crash when a connection
disconnected without sending any data, but it introduced a problem of
its own: because the socket is non-blocking, read_exact() may read
some data, then return ErrorKind::WouldBlock.  In that case, the data
it read will be discarded.  So for example if it read "CONNECT ",
and then nothing else was available to read yet, "CONNECT " would be
discarded, and so the next time this function was called, when epoll
triggered again for the socket, only the following data would end up
in command.buf, causing an error due to just a port number being an
invalid command.

Contrary to that commit message, this code was actually designed to
handle short reads just fine — in the case of a short read, it stores
the data it has read in command, and returns
Error::UnixRead(ErrorKind::WouldBlock), which is ignored by the
caller, and the function gets called again when there is more data to
read, building up command potentially over the course of several
reads.  The only thing it didn't handle correctly, as far as I can
tell, was a 0-byte read, which happens when a client disconnects from
the socket without writing anything.  All that's needed to fix this is
to avoid an invalid subtraction in that case, so this change reverts
82ac114b8, fixing the issue with partial commands being discarded, and
instead handles the 0-byte read by using slice::get, and treating an
empty command as an incomplete command, which of course it is.

Fixes: 82ac114b8 ("virtio-devices: vsock: handle short read in muxer")
Signed-off-by: Alyssa Ross <hi@alyssa.is>
2025-07-14 18:07:07 +00:00
Alyssa Ross
01aed9733c build: add missing dependency features
This makes it possible to run cargo test just for the virtio-devices
crate (as long as either KVM or MSHV is specified).

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2025-07-14 18:06:54 +00:00
Muminul Islam
b268e88ba3 virtio-devices: remove unnecessary parentheses
Cargo fuzz build report an warning:

warning: unnecessary parentheses around closure body
--> virtio-devices/src/iommu.rs:578:41
|
578 |.retain(|&x, _| (x < req.virt_start || x > req.virt_end));
|                                         ^
|
= note: `#[warn(unused_parens)]` on by default
help: remove these parentheses
|
578 -.retain(|&x, _| (x < req.virt_start || x > req.virt_end));
578 +.retain(|&x, _| x < req.virt_start || x > req.virt_end);
|

warning: `virtio-devices` (lib) generated 1 warning
(run `cargo fix --lib -p virtio-devices` to apply 1 suggestion)

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2025-07-11 22:02:15 +00:00
Jinank Jain
190d90196f build: Bump vfio and all the dependent crates to latest version
Recently vfio crates have moved to crates.io, thus we should start
consuming the crate from crates.io instead git url.

This results in better versioning instead of tracking some git commit
sha.

Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
2025-07-07 03:05:38 +00:00
Philipp Schuster
1433763d40 misc: virtio-devices: manual adjustment of special case
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-06-13 19:55:54 +00:00
Philipp Schuster
8e2973fe7c misc: virtio-devices: streamline error Display::fmt()
The changes were mostly automatically applied using the Python
script mentioned in the first commit of this series.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-06-13 19:55:54 +00:00
Gauthier Jolly
3d78662498 block: virtio-blk: report IO errors to the guest
Instead of exiting on IO errors, report the errors to the guest with
VIRTIO_BLK_S_IOERR. For example, the guest kernel will log something
similar to this if the nbd behind /dev/vdc is unexpectedly disconnected:

[  166.033957] I/O error, dev vdc, sector 264 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
[  166.035083] Aborting journal on device vdc-8.
[  166.037307] Buffer I/O error on dev vdc, logical block 9, lost sync page write
[  166.038471] JBD2: I/O error when updating journal superblock for vdc-8.
[...]
[  174.234470] EXT4-fs (vdc): I/O error while writing superblock

In case the rootfs is not located on the affected block device, this
will not crash the guest.

Fixes: #6995

Signed-off-by: Gauthier Jolly <contact@gjolly.fr>
2025-06-09 16:48:07 +00:00
Jinank Jain
2bc8d51a60 misc: Fix missing lifetime syntax clippy warning
This was caught by the nightly compiler during cargo fuzz build.

error: lifetime flowing from input to output with different syntax can be confusing
   --> /home/runner/work/cloud-hypervisor/cloud-hypervisor/hypervisor/src/arch/x86/emulator/mod.rs:493:26
    |
493 |     pub fn new(platform: &mut dyn PlatformEmulator<CpuState = T>) -> Emulator<T> {
    |                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^     ----------- the lifetime gets resolved as `'_`
    |                          |
    |                          this lifetime flows to the output
    |
    = note: `-D mismatched-lifetime-syntaxes` implied by `-D warnings`
    = help: to override `-D warnings` add `#[allow(mismatched_lifetime_syntaxes)]`
help: one option is to remove the lifetime for references and use the anonymous lifetime for paths
    |
493 |     pub fn new(platform: &mut dyn PlatformEmulator<CpuState = T>) -> Emulator<'_, T> {

Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
2025-06-09 11:19:11 +00:00
Philipp Schuster
20296e909a misc: streamline thiserror cargo dep
As almost every sub crate depends on thiserror, lets upgrade it to a
workspace dependency.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-05-28 17:24:34 +00:00
Philipp Schuster
28e0a95450 misc: virtio-devices: streamline #[source] and Error
This streamlines the code base to follow best practices for
error handling in Rust: Each error struct implements
std::error::Error (most due via thiserror::Error derive macro)
and sets its source accordingly.

This allows future work that nicely prints the error chains,
for example.

So far, the convention is that each error prints its
sub error as part of its Display::fmt() impl.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-05-21 09:09:30 +00:00
Philipp Schuster
a615c809eb misc: vsock: streamline #[source] and Error
This streamlines the code base to follow best practices for
error handling in Rust: Each error struct implements
std::error::Error (most due via thiserror::Error derive macro)
and sets its source accordingly.

This allows future work that nicely prints the error chains,
for example.

So far, the convention is that each error prints its
sub error as part of its Display::fmt() impl.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-05-21 09:09:30 +00:00
Philipp Schuster
a212343908 misc: arch/riscv64: streamline #[source] and Error
This streamlines the code base to follow best practices for
error handling in Rust: Each error struct implements
std::error::Error (most due via thiserror::Error derive macro)
and sets its source accordingly.

This allows future work that nicely prints the error chains,
for example.

So far, the convention is that each error prints its
sub error as part of its Display::fmt() impl.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-05-21 09:09:30 +00:00
Gregory Anders
dce82a34d0 net_util: add support for IPv6 addresses on tap interfaces
Allow tap interfaces to be configured with an IPv6 address. The change
is fairly straightforward: we need to update the API types and CLI
parsing to accept either an IPv6 or IPv4 and then match on the IP
address type when the tap device is configured.

For IPv6 addresses, the netmask (prefix) must be provided at the same
time as the address itself (in the SIOCSIFADDR ioctl). They cannot be
configured separately. So we remove the separate "set_netmask" function
and convert "set_ip_addr" to also accept a netmask. For IPv4 addresses,
the IP address and netmask were already always set together, so this
should have no functional impact for users of IPv4 addresses.

Signed-off-by: Gregory Anders <ganders@cloudflare.com>
2025-05-20 16:41:04 +00:00
Philipp Schuster
78b0f68b21 vmm: Error for MemoryManagerError
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>

On-behalf-of: SAP <philipp.schuster@sap.com>
2025-05-16 11:42:01 +00:00
Philipp Schuster
b2993fb2fa vmm: Error for vsock::unix::Error
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-05-16 11:42:01 +00:00
Philipp Schuster
05968f5c2c block: introduce advisory locks for disk image files
# What

This commit introduces file-based advisory locking for the files backing
up the block devices by using the fcntl() syscall with OFD locks. The
per-open-file-descriptor (OFD) locks are more robust than traditional
POSIX locks (F_SETLK) as they are not tied to process IDs and avoid
common issues in multithreaded or multi-fd scenarios [1]. Therefore,
we don't use `std::fs::File::try_lock()`, which is backed by F_SETLKW.

The locking mechanism is aware of the `readonly` property and allows
`n` readers or `1` writer (exclusive mode).

As the locks are advisory, multiple cloud-hypervisor processes can
prevent themselves from writing to the same file. However, this is not
a system-wide file-system level locking mechanism preventing to open()
a file.

The introduced new locking mechanism does not cover vhost-user devices.

# Why

To prevent misconfiguration and improve safety, it is good practice to
protect disk image files with a locking mechanism. Experience and common
best practices suggest that advisory locks are preferable over mandatory
locks due to better compatibility and fewer pitfalls (in fs space).

The introduced functionality is aligned with the approach taken by
QEMU [0], and is also recommended in [1].

# Implementation Details

We need to ensure that not only normal operation keeps working but also
state save/resume and live-migration. Especially for live migration,
it is crucial that the sender VMM releases the locks when the VM stops
so the receiver VMM can acquire them right after that.

Therefore, the locking and releasing happen directly on the block
device struct. The device manager knows all block devices and can
forward requests to these types.

Last but not least, this commit uses on explicit lock acquiring
but implicit lock releasing (FD close). It only explicitly releases
the locks where this integrates more smoothly into the existing
code.

# Testing

I tested
- normal operation
- state save/resume,
- device hot plugging,
- and live-migration
with read/shared and write/exclusive locks.

One can use the `fcntl-tool` to test if locks are actually acquired
or released [2].

# Links

[0] 825b96dbce/util/osdep.c (L266)
[1] https://apenwarr.ca/log/20101213
[2] https://crates.io/crates/fcntl-tool

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-05-16 08:07:32 +00:00
Bo Chen
aaf86ef209 pci: Reprogram device BAR when its MSE bit is set
The Memory Space Enable (MSE) bit from the COMMAND register in the
PCI configuration space controls whether a PCI device responds to memory
space accesses, e.g. read and write cycles to the device MMIO regions
defined by its BARs. The MSE bit is used by the device drivers to ensure
the correctness of BAR reprogramming. A common workflow is, the driver
first clears the MSE bit, then writes new values to the BAR registers,
and finally set the MSE bit to finish the BAR reprogramming.

This patch changes how we handle BAR reprogramming for all PCI
devices (e.g. virtio-pci, vfio, vfio-user, etc.), so that we follow the
same convention, e.g. moving PCI BARs when its MSE bit is set.

Note that some device drivers (such as edk2) only clear and set MSE once
while reprogramming multiple BARs of a single device. To support such
behavior, this patch adds support for multiple pending BAR reprogramming.

See: https://github.com/cloud-hypervisor/cloud-hypervisor/issues/7027#issuecomment-2853642959

Signed-off-by: Bo Chen <bchen@crusoe.ai>
2025-05-15 17:35:44 +00:00
Bo Chen
cb52cf91df pci: Keep detect_bar_reprogramming internal to PciConfiguration
A BAR reprogramming of a PCI device will only happen when the (guest)
kernel write to its PCI config space, e.g. the detection of bar
reprogramming (`detect_bar_repgraomming()`) can be embedded to the PCI
config space write (`write_config_register()`). It simplifies APIs
exposed by the `struct PciConfiguration` and `trait PciDevice`. It also
prepares for easier handling of pending bar reprogramming when the MSE
bit of the COMMAND register is not enabled at the time of changing BAR
registers.

See: https://github.com/cloud-hypervisor/cloud-hypervisor/issues/7027#issuecomment-2853642959

Signed-off-by: Bo Chen <bchen@crusoe.ai>
2025-05-15 15:04:01 +00:00
Jinank Jain
ea4693a091 misc: Fix clippy error from beta compiler
Rust has a new way of constructing other error and clippy complains if
we are still using the older way to construct error message. Thus,
migrate to the new approach suggested by the clippy.

Warning from beta compiler:

error: this can be `std::io::Error::other(_)`
--> block/src/vhdx/mod.rs:142:17
 |
 | /                 std::io::Error::new(
 | |                     std::io::ErrorKind::Other,
 | |                     format!("Failed to update VHDx header: {e}"),
 | |                 )
 | |_________________^
 |
 = help: for further information visit
https://rust-lang.github.io/rust-clippy/master/index.html#io_other_error
help: use `std::io::Error::other`

                 std::io::Error::other(
                     format!("Failed to update VHDx header: {e}"),

Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
2025-04-03 13:11:49 +00:00
Jinank Jain
3698b8e74c build: Centralize serde_json crate to workspace
`serde_json` crate is referenced by multiple components, centralize it
to workspace to better manage this crate.

Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
2025-04-02 06:20:54 +00:00
Ruoqing He
4de422ad69 misc: Fix clippy - manually reimplementing div_ceil
Reported by 1.86.0-beta.1 (f0cb41030 2025-02-17).

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-03-01 01:02:17 +00:00
Ruoqing He
c441bb2968 misc: Fix clippy - doc list item overindented
Reported by 1.86.0-beta.1 (f0cb41030 2025-02-17).

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-03-01 01:02:17 +00:00
Nikolay Edigaryev
27fda753e1 virtio-devices: iommu: allow limiting maximum address width in bits
Currently, Cloud Hypervisor does not set a VIRTIO_IOMMU_F_INPUT_RANGE
feature bit for the VirtIO IOMMU device, which, according to spec[1],
means that the guest may use the whole 64-bit address space is for
IOMMU purposes:

>If the feature is not offered, virtual mappings span over the whole
>64-bit address space (start = 0, end = 0xffffffff ffffffff)

As far as I am aware, there are currently no host platforms on
the market capable of addressing the whole 64-bit address space.

For example, I am currently working with a host platform that reports
39-bit address space for IOMMU purposes:

>DMAR: Host address width 39

When running a VFIO pass-through guest on such a platform, NVIDIA
driver in guest gets DMA mapping failures when working with large data,
and this results in Cloud Hypervisor exiting with the following error:

>cloud-hypervisor: 1501.220535s: <__iommu>
>ERROR:virtio-devices/src/thread_helper.rs:53 -- Error running worker:
>HandleEvent(Failed to process request queue : ExternalMapping(Custom
>{ kind: Other, error: "failed to map memory for VFIO container, iova
>0x7fff00000000, gpa 0x24ce25000, size 0x1000: IommuDmaMap(Error(22))"
>}))

Passing "--platform iommu_address_width=39" to Cloud Hypervisor built
with this change fixes this.

[1]: https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/
virtio-v1.3-csd01.html#x1-5420006

Signed-off-by: Nikolay Edigaryev <edigaryev@gmail.com>
2025-01-14 21:31:47 +00:00
Wei Liu
0cb2c86ff4 fuzz: introduce a virtio vsock fuzzer
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2025-01-14 00:26:01 +00:00
Wei Liu
d359c8cdce virtio-devices: vsock: allow fuzzer to use TestBackend
Instead of reinventing this mock infrastructure in the upcoming fuzzer,
reuse the one that is already available.

However this change makes Clippy complain that TestBackend and
TestContext don't implement Default. This is just test code, we can
suppress Clippy in this case.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2025-01-14 00:26:01 +00:00
Rob Bradford
2fc4de6c65 virtio-devices: iommu: Use hex formatting in log messages
This means that the the addresses are more easily readable.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2025-01-13 21:46:23 +00:00
Rob Bradford
03eeb36b74 virtio-devices: iommu: Search full range for GVA conversion
Remove an erroneous optimisation that used the page size mask to reduce
the range to iterate through on the set of mappings. This doesn't work
as the virtio-iommu ranges are larger than a single page. This may have
worked in the past when the mappings were limited to a single page.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2025-01-13 21:46:23 +00:00
Wei Liu
a1af4238ae virtio-devices: make ioeventfds() return an iterator
MSHV's SEV-SNP implementation calls ioeventfds whenever there is an
event.

This change removes the need frequent allocation and deallocation of a
vector, while at the same time makes sure other call sites are
unaffected.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2025-01-09 21:28:46 +00:00
Wei Liu
d2e798944a virtio-devices: rename two variables
They are used. No need to start their names with an underscore.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2025-01-09 21:28:46 +00:00
Wei Liu
d99f294281 pci: rename as_any to as_any_mut
That trait function returns a mutable reference. Rename it to follow
Rust's convention.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2025-01-09 21:28:46 +00:00
Wei Liu
778c05d678 virtio-devices: use C ABI-qualification for packed structures
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2025-01-09 13:51:42 +00:00
Rob Bradford
2624f17ffe virtio-devices: Automatically fix operator precedence clippy warning
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2025-01-07 17:44:41 +00:00
Rob Bradford
eeae63b459 build: Bump thiserror version
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2025-01-06 17:39:45 +00:00
Wei Liu
32482f6634 block: make available VIRTIO_BLK_F_SEG_MAX
This allows the guest to put in more than one segment per request. It
can improve the throughput of the system.

Introduce a new check to make sure the queue size configured by the user
is large enough to hold at least one segment.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2025-01-01 18:50:39 +00:00
dependabot[bot]
0c2f2d3ec1 build: Bump anyhow from 1.0.87 to 1.0.94
Bumps [anyhow](https://github.com/dtolnay/anyhow) from 1.0.87 to 1.0.94.
- [Release notes](https://github.com/dtolnay/anyhow/releases)
- [Commits](https://github.com/dtolnay/anyhow/compare/1.0.87...1.0.94)

---
updated-dependencies:
- dependency-name: anyhow
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-05 00:30:01 +00:00
dependabot[bot]
30cf1eed5e build: Bump libc from 0.2.158 to 0.2.167
Bumps [libc](https://github.com/rust-lang/libc) from 0.2.158 to 0.2.167.
- [Release notes](https://github.com/rust-lang/libc/releases)
- [Changelog](https://github.com/rust-lang/libc/blob/0.2.167/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/libc/compare/0.2.158...0.2.167)

---
updated-dependencies:
- dependency-name: libc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-03 01:15:36 +00:00
Ruoqing He
ab7b294688 misc: Replace map_or on false with is_some_and
Replace `map_or()` on false condition with `is_some_and` to provide
better readability, as suggestted by v1.84.0-beta.1 `cargo clippy`.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2024-11-29 12:44:33 +00:00
Rob Bradford
df1d6eaaee virtio-devices: Enable VIRTIO_RING_F_INDIRECT_DESC
This improves sequential write performance using fio (2888MiB/s ->
3293MiB/s)

VM config: cloud-hypervisor --disk path=~/workloads/jammy.raw,direct=on path=~/workloads/big-disk.img,direct=on --cpus boot=1 --memory size=2G,shared=on --serial tty --console off --seccomp log --kernel ~/workloads/hypervisor-fw

Host: fio --filename=big-disk.img --direct=1 --rw=write --bs=256k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=1 --time_based --group_reporting --name=throughput-test-job --eta-newline=1

VM:  fio --filename=/dev/vdb --direct=1 --rw=write --bs=256k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=1 --time_based --group_reporting --name=throughput-test-job --eta-newline=1

Baseline (file on filesystem on host used as backing store for block
device):

throughput-test-job: (groupid=0, jobs=1): err= 0: pid=10169: Tue Nov  5 09:31:55 2024
  write: IOPS=13.5k, BW=3385MiB/s (3549MB/s)(397GiB/120008msec); 0 zone resets
    slat (usec): min=4, max=10222, avg=20.25, stdev=29.01
    clat (usec): min=984, max=45599, avg=4706.01, stdev=2278.11
     lat (usec): min=1002, max=45610, avg=4726.27, stdev=2278.77
    clat percentiles (usec):
     |  1.00th=[ 3195],  5.00th=[ 3228], 10.00th=[ 3261], 20.00th=[ 3261],
     | 30.00th=[ 3261], 40.00th=[ 3261], 50.00th=[ 3294], 60.00th=[ 3916],
     | 70.00th=[ 5014], 80.00th=[ 7308], 90.00th=[ 7635], 95.00th=[ 7898],
     | 99.00th=[ 8586], 99.50th=[ 8979], 99.90th=[36439], 99.95th=[36963],
     | 99.99th=[43779]
   bw (  MiB/s): min= 1934, max= 4821, per=100.00%, avg=3391.67, stdev=1266.42, samples=239
   iops        : min= 7738, max=19286, avg=13566.67, stdev=5065.65, samples=239
  lat (usec)   : 1000=0.01%
  lat (msec)   : 2=0.03%, 4=61.10%, 10=38.62%, 20=0.11%, 50=0.15%
  cpu          : usr=17.13%, sys=14.38%, ctx=1352501, majf=0, minf=11
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,1624829,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=3385MiB/s (3549MB/s), 3385MiB/s-3385MiB/s (3549MB/s-3549MB/s), io=397GiB (426GB), run=120008-120008msec

Disk stats (read/write):
    dm-2: ios=129/1624787, sectors=1872/831364040, merge=0/0, ticks=185/6960387, in_queue=6960572, util=100.00%, aggrios=130/1626025, aggsectors=1880/831915888, aggrmerge=0/0, aggrticks=194/6967818, aggrin_queue=6968012, aggrutil=99.97%
    dm-0: ios=130/1626025, sectors=1880/831915888, merge=0/0, ticks=194/6967818, in_queue=6968012, util=99.97%, aggrios=130/1606095, aggsectors=1880/831915888, aggrmerge=0/19930, aggrticks=204/6634488, aggrin_queue=6635288, aggrutil=58.59%
  nvme0n1: ios=130/1606095, sectors=1880/831915888, merge=0/19930, ticks=204/6634488, in_queue=6635288, util=58.59%

On block device in VM:

throughput-test-job: (groupid=0, jobs=1): err= 0: pid=667: Tue Nov  5 09:53:19 2024
  write: IOPS=13.2k, BW=3293MiB/s (3453MB/s)(386GiB/120008msec); 0 zone resets
    slat (usec): min=4, max=3518, avg=27.77, stdev=35.32
    clat (usec): min=723, max=44252, avg=4829.82, stdev=2222.41
     lat (usec): min=735, max=44270, avg=4857.85, stdev=2223.45
    clat percentiles (usec):
     |  1.00th=[ 3097],  5.00th=[ 3195], 10.00th=[ 3195], 20.00th=[ 3228],
     | 30.00th=[ 3261], 40.00th=[ 3294], 50.00th=[ 3621], 60.00th=[ 4555],
     | 70.00th=[ 5997], 80.00th=[ 7242], 90.00th=[ 7570], 95.00th=[ 7898],
     | 99.00th=[ 8586], 99.50th=[ 8848], 99.90th=[36439], 99.95th=[36963],
     | 99.99th=[40633]
   bw (  MiB/s): min= 1914, max= 4857, per=100.00%, avg=3299.46, stdev=1180.81, samples=239
   iops        : min= 7658, max=19430, avg=13197.77, stdev=4723.22, samples=239
  lat (usec)   : 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=52.79%, 10=46.95%, 20=0.10%, 50=0.14%
  cpu          : usr=25.95%, sys=16.71%, ctx=1111821, majf=0, minf=10
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,1580693,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=3293MiB/s (3453MB/s), 3293MiB/s-3293MiB/s (3453MB/s-3453MB/s), io=386GiB (414GB), run=120008-120008msec

Disk stats (read/write):
  vdb: ios=60/1953213, merge=0/0, ticks=14/8229134, in_queue=8229149, util=100.00%

Prior to change:

throughput-test-job: (groupid=0, jobs=1): err= 0: pid=667: Tue Nov  5 09:37:45 2024
  write: IOPS=11.6k, BW=2888MiB/s (3028MB/s)(338GiB/120008msec); 0 zone resets
    slat (usec): min=3, max=3200, avg=18.48, stdev=24.54
    clat (usec): min=1237, max=46575, avg=5521.41, stdev=2641.99
     lat (usec): min=1249, max=46591, avg=5540.06, stdev=2643.54
    clat percentiles (usec):
     |  1.00th=[ 2999],  5.00th=[ 3163], 10.00th=[ 3195], 20.00th=[ 3261],
     | 30.00th=[ 3294], 40.00th=[ 3359], 50.00th=[ 6063], 60.00th=[ 7111],
     | 70.00th=[ 7373], 80.00th=[ 7570], 90.00th=[ 7832], 95.00th=[ 8094],
     | 99.00th=[ 8717], 99.50th=[ 9241], 99.90th=[36963], 99.95th=[37487],
     | 99.99th=[41157]
   bw (  MiB/s): min= 1936, max= 4826, per=100.00%, avg=2892.43, stdev=1202.99, samples=239
   iops        : min= 7746, max=19306, avg=11569.68, stdev=4811.98, samples=239
  lat (msec)   : 2=0.01%, 4=46.26%, 10=53.38%, 20=0.09%, 50=0.26%
  cpu          : usr=14.20%, sys=8.59%, ctx=1246257, majf=0, minf=12
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,1386102,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=2888MiB/s (3028MB/s), 2888MiB/s-2888MiB/s (3028MB/s-3028MB/s), io=338GiB (363GB), run=120008-120008msec

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2024-11-05 15:44:41 +00:00
Ruoqing He
0aab960bf1 misc: Elide needless lifetimes
As clippy of rust-toolchain version 1.83.0-beta.1 suggests, elide
needless lifetimes to `'_`.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2024-10-18 17:46:39 +00:00
Ruoqing He
297236a7c0 misc: Eliminate use of assert!((...).is_ok())
Asserting on .is_ok()/.is_err() leads to hard to debug failures (as if
the test fails, it will only say "assertion failed: false". We replace
these with `.unwrap()`, which also prints the exact error variant that
was unexpectedly encountered (we can to this these days thanks to
efforts to implement Display and Debug for our error types). If the
assert!((...).is_ok()) was followed by an .unwrap() anyway, we just drop
the assert.

Inspired by and quoted from @roypat.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2024-10-03 12:03:49 +00:00
Ruoqing He
61e57e1cb1 misc: Further improve imports styling
By introducing `imports_granularity="Module"` format strategy,
effectively groups imports from the same module into one line or block,
improving maintainability and readability.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2024-09-29 16:13:48 +00:00
Rob Bradford
88a9f79944 misc: Adapt consistent import style formatting
Historically the Cloud Hypervisor coding style has been to ensure that
all imports are ordered and placed in a single group. Unfortunately
cargo fmt has no support for ensuring that all imports are in a single
group so if whitespace lines were added as part of the import statements
then they would only be odered correctly in the group.

By adopting "group_imports="StdExternalCrate" we can enforce a style
where imports are placed in at most three groups for std, external
crates and the crate itself. Choosing a style enforceable by the tooling
reduces the reviewer burden.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2024-09-29 13:08:12 +01:00
Ruoqing He
5a70d7ec69 build: Centralize rust-vmm crates to workspace
Modify `Cargo.toml` in each member crate to follow the dependencies
specified in root `Cargo.toml` file.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2024-09-27 15:58:21 +00:00
Rob Bradford
d90fa96bb7 build: Bulk update vm-memory and related dependencies
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2024-09-26 12:31:25 +00:00
Alyssa Ross
287887c99c vmm: fix console IO safety
Rebooting a VM fails with the following error when debug assertions
are enabled:

	fatal runtime error: IO Safety violation: owned file descriptor already closed

This happens because FromRawFd::from_raw_fd is used on RawFds stored
in ConsoleInfo every time a VM begins to boot, so the second
time (after a reboot, or if the first attempt to boot via the API
failed), the fd will be closed.  Until this assertion is hit, the code
is operating on either closed file descriptors, or new file
descriptors for something completely different.  If debug assertions
are disabled, it will just continue doing this with unpredictable
results.

To fix this, and prevent the problem reocurring, ownership of the
console file descriptors needs to be properly tracked, using Rust's
type system, so this commit refactors the console code to do that.
The file descriptors are now passed around with reference counts, so
they won't be closed prematurely.  The obvious way to do this would be
to just have each member of ConsoleInfo be an Arc<File>, but we need
to accomodate that serial console file descriptors can also be
sockets.  We can't just store an OwnedFd and convert it when it's
used, because we only get a reference from the Arc, so we need to
store the descriptors as their concrete types in an enum.  Since this
basically duplicates the ConsoleOutputMode enum from the config, the
ConsoleOutputMode enum is now not used past constructing the
ConsoleInfo.

So that ownership can be represented consistently, the debug console's
tty mode now uses its own stdout descriptor.

I'm still using .try_clone().unwrap() (i.e. dup()) to clone file
descriptors for Endpoint::FilePair and Endpoint::TtyPair, because I
assume there's a reason for them not just to hold a single file
descriptor.

I've also retained the existing behaviour of having serial manager
ignore the tty file descriptor passed to it (which is stdout), and
instead using stdin.  It looks a lot weirder now, because it has to
explicitly indicate it's ignoring the fd with an underscore binding.

Fixes: 52eebaf6 ("vmm: refactor DeviceManager to use console_info")
Signed-off-by: Alyssa Ross <hi@alyssa.is>
2024-09-25 22:34:43 +00:00