The BufWriter must be flushed explicitly to handle errors
properly. Without explicit flush, errors during the implicit
drop flush are ignored.
This is the same issue fixed for write_pointer_table
in commit 85556951a.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
Add set_cluster_refcount_track_freed() helper to consolidate the
common pattern of setting a cluster refcount and tracking freed
refblocks. This reduces code duplication and improves readability.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
Freed clusters correctly have refcount=0. Remove the assertion that
expected no clusters with zero refcount, as it was validating the
buggy behavior.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
When converting a compressed cluster to standard during write
operations, the old compressed cluster's refcount was never
decremented, causing leak warnings by `qemu-img check ..`
`Leaked cluster X refcount=N reference=M`
Additionally, compressed data can span multiple physical clusters,
not just one. The compressed cluster address and size are encoded
in the L2 entry, and the data may cross cluster boundaries.
The proper handling is implemented as follows:
- Extract compressed cluster address and size before overwriting
L2 entry
- Identify all clusters occupied by the compressed data
- Decrement refcount for each cluster in the range
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
When a refcount block is evicted from cache and replaced with a new
one, the old refcount block cluster was added to unref_clusters but
its refcount was never decremented to 0 on disk. This left the cluster
with refcount=1 while no metadata referenced it, causing errors in
qemu-img check
`Leaked cluster X refcount=1 reference=0`
This fix recursively calls set_cluster_refcount(freed_cluster, 0) to
properly decrement the freed refcount block's refcount on disk. The
recursion handles cascading replacements where freeing one refcount
block may trigger the replacement of another.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
The OFLAG_COPIED bit (bit 63) indicates a cluster's refcount is exactly
1 and doesn't need copy-on-write. This bit must be set in L1 entries
when their referenced L2 clusters have refcount=1.
Previously, L1 entries were always written as raw addresses without the
OFLAG_COPIED bit, violating the QCOW2 specification and causing qemu-img
check to report errors like
`ERROR OFLAG_COPIED L2 cluster: l1_index=X .... refcount=1`
The implementation queries each L2 cluster's refcount in sync_caches()
and sets OFLAG_COPIED appropriately when writing the L1 table. This
ensures QCOW2 images are specification compliant and maintain correct
COW semantics to avoid data corruption.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
Previously the code relies on the implicit flush when BufWriter is
dropped. That's not safe.
Per BufWriter's document:
```
It is critical to call flush before BufWriter<W> is dropped. Though
dropping will attempt to flush the contents of the buffer, any errors
that happen in the process of dropping will be ignored. Calling flush
ensures that the buffer is empty and thus dropping will not even attempt
file operations.
```
Signed-off-by: Wei Liu <liuwe@microsoft.com>
This helps to uncover expensive and needless clones in the code base.
For example, I prevented extensive clones in the snapshot path where
(nested) BTreeMap's have been cloned over and over again. Further,
the lint helps devs to much better reason about the ownership of
parameters.
All of these changes have been done manually with the necessary
caution. A few structs that are cheap to clone are now `copy` so that
this lint won't trigger for them.
I didn't enable the lint so far as it is a massive rabbit hole and
needs much more fixes. Nevertheless, it is very useful.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This removes cognitive load when reading if statements.
All changes were applied by clippy via `--fix`.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This commit is part of a series of similar commits.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Add support of reading and writing compressed clusters.
Support zlib and zstd compressions.
L2 cache: store entire L2 entries, not only standard cluster addresses.
Read path. Offsets of compressed clusters cannot be determined,
therefore replace QcowFile.file_offset_read() with QcowFile.file_read().
This method reads the cluster, decompresses it if necessary and returns
the data to the caller.
Write path. QcowFile.file_offset_write(): since writing to compressed
clusters is not generally possible, allocate a new standard
(non-compressed) cluster if compressed L2 entry is encountered; then
decompress compressed cluster into new cluster; then return offset
inside new cluster to the caller. Processing of standard clusters is
not changed.
Signed-off-by: Eugene Korenevsky <ekorenevsky@aliyun.com>
The granularity has significant implications in typical cloud
deployments with network storage. The Linux kernel will sync advisory
locks to network file systems, but these backends may have different
policies and handle locks differently. For example, Netapp speaks a NFS
API but will treat advisory OFD locks for the whole file as mandatory
locks, whereas byte-range locks for the whole file will remain
advisory [0].
As it is a valid use case to prevent multiple CHV instances from
accessing the same disk but disk management software (e.g., Cinder in
OpenStack) should be able to snapshot disks while VMs are running, we
need special control over the lock granularity. Therefore, it is a valid
use case to lock the whole byte range of a disk image without
technically locking the whole file - to get the best of both worlds.
This also brings CHVs behavior in line with QEMU [1].
Whole-file locks remain a valid use case and could be supported later.
This patch only provides the necessary groundwork; making it
configurable is out of scope for now.
[0] https://kb.netapp.com/on-prem/ontap/da/NAS/NAS-KBs/How_is_Mandatory_Locking_supported_for_NFSv4_on_ONTAP_9
[1] <qemu>/util/osdep.c::qemu_lock_fcntl()
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This should be guaranteed by GuestMemory and GuestMemoryRegion, but
those traits are currently safe, so add checks to guard against
incorrect implementations of them.
Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
This better aligns with the rest of the code and makes it clearer
that these tests can run "as is" in a normal hosted environments
without the special test environment.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
A major improvement to the developer experience of clippy in
Cloud Hypervisor.
1. Make `cargo clippy` just work with the same lints we use in CI
2. Simplify adding new lints
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
There are several copy-pasted code fragments in impl QcowFile. All of
them add L2 entry to the cache and one of them (in file_offset_write())
does also allocating new L2 entry if necessary.
Fold all these code fragments (except of one in l2_table() which does
error handling in special way) into cache_l2_cluster() method without
changing the logic.
This will make the code more compact and clean.
Signed-off-by: Eugene Korenevsky <ekorenevsky@aliyun.com>
Consuming `&Arc<T>` as argument is almost always an antipattern as it
hides whether the callee is going to take over (shared) ownership
(by .clone()) or not. Instead, it is better to consume `&dyn T` or
`Arc<dyn T>` to be more explicit. This commit cleans up the code.
The change is very mechanic and was very easy to implement across the
code base.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Fix clippy warning `uninlined_format_args` reported by rustc rustc
1.89.0 (29483883e 2025-08-04).
```console
warning: variables can be used directly in the `format!` string
--> block/src/lib.rs:649:17
|
649 | info!("{} failed to create io_uring instance: {}", error_msg, e);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
= note: `#[warn(clippy::uninlined_format_args)]` on by default
help: change this to
|
649 - info!("{} failed to create io_uring instance: {}", error_msg, e);
649 + info!("{error_msg} failed to create io_uring instance: {e}");
|
```
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
This bumps the MSRV to 1.88 (also, Rust edition 2024 is mandatory).
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This upgrades the Cargo workspace to Rust edition 2024 to keep the
code base clean and up to date.
The commit only contains the adjustments to the Cargo.toml files and
basic compiler error fixes. Also, this commit includes new SAFETY
comments as discussed in [1]. The changes were not automatically
fixed by `cargo fix --edition` but needed manual adjustments.
Apart from that, all formatting and clippy adjustments follow in
subsequent commits.
*
As only exception, workspace member net_gen sticks to edition 2021
for now as discussed in [0].
[0] https://github.com/cloud-hypervisor/cloud-hypervisor/pull/7295#discussion_r2310851041
[1] https://github.com/cloud-hypervisor/cloud-hypervisor/pull/7256#issuecomment-3271888674
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Implement the batch submission function for raw disk, default it is
enabled. After parsing the requests this method is
called for better IO latency and bandwidth.
Signed-off-by: Bo Chen <bchen@crusoe.ai>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Cache and batch IO requests after parsing all
items in the queue, improving performance—especially
for small block sizes—by reducing per-request overhead.
Introduced two methods in the AsyncIo trait for batch
submission, with implementation in the raw disk backend.
This method should be called during/after parsing all block IO requests
in the available queue. If the batch submission is not enabled, by
default it does the old way of submitting requests.
Signed-off-by: Bo Chen <bchen@crusoe.ai>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Instead of returning boolean return an struct of completion status
so that it can be cached for batch submission.
Signed-off-by: Bo Chen <bchen@crusoe.ai>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
This patch changes the read-only check using acked features bit, which
will help to check more features.
Signed-off-by: Songqian Li <sionli@tencent.com>
Many of the workspace members in the Cloud-hypervisor workspace share
common dependencies. Making these workspace dependencies reduces
duplication and improves maintainability.
Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
It shouldn't be necessary to lock the file for the adaptor. This removes
two layers of indirection for QcowDiskSync and VhdxDiskSync.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Recently vfio crates have moved to crates.io, thus we should start
consuming the crate from crates.io instead git url.
This results in better versioning instead of tracking some git commit
sha.
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
That feature was dropped when consolidating the UUID dependency because
somehow building the whole project worked. The CI system was happy.
However, building the block crate alone is broken. The vhdx code uses
Uuid::new_v4, which requires `v4` to be enabled.
Add the feature back.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Cargo fuzz build report an error:
error: lifetime flowing from input to output with different syntax can be confusing
--> /home/runner/work/cloud-hypervisor/cloud-hypervisor/block/src/lib.rs:747:13
|
747 | fn file(&mut self) -> MutexGuard<F>;
| ^^^^^^^^^ ------------- the lifetime gets resolved as `'_`
| |
| this lifetime flows to the output
error: lifetime flowing from input to output with different syntax can be confusing
--> /home/runner/work/cloud-hypervisor/cloud-hypervisor/block/src/async_io.rs:68:11
|
68 | fn fd(&mut self) -> BorrowedDiskFd;
| ^^^^^^^^^ -------------- the lifetime gets resolved as `'_`
| |
| this lifetime flows to the output
Signed-off-by: Yi Wang <foxywang@tencent.com>
The changes were mostly automatically applied using the Python
script mentioned in the first commit of this series.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This was caught by the nightly compiler during cargo fuzz build.
error: lifetime flowing from input to output with different syntax can be confusing
--> /home/runner/work/cloud-hypervisor/cloud-hypervisor/hypervisor/src/arch/x86/emulator/mod.rs:493:26
|
493 | pub fn new(platform: &mut dyn PlatformEmulator<CpuState = T>) -> Emulator<T> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ----------- the lifetime gets resolved as `'_`
| |
| this lifetime flows to the output
|
= note: `-D mismatched-lifetime-syntaxes` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(mismatched_lifetime_syntaxes)]`
help: one option is to remove the lifetime for references and use the anonymous lifetime for paths
|
493 | pub fn new(platform: &mut dyn PlatformEmulator<CpuState = T>) -> Emulator<'_, T> {
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
As almost every sub crate depends on thiserror, lets upgrade it to a
workspace dependency.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This streamlines the Error implementation in the Cloud Hypervisor code
base to match the remaining parts so that everything follows the agreed
conventions. These are leftovers missed in the previous commits.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This streamlines the code base to follow best practices for
error handling in Rust: Each error struct implements
std::error::Error (most due via thiserror::Error derive macro)
and sets its source accordingly.
This allows future work that nicely prints the error chains,
for example.
So far, the convention is that each error prints its
sub error as part of its Display::fmt() impl.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com