Freed clusters correctly have refcount=0. Remove the assertion that
expected no clusters with zero refcount, as it was validating the
buggy behavior.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
When converting a compressed cluster to standard during write
operations, the old compressed cluster's refcount was never
decremented, causing leak warnings by `qemu-img check ..`
`Leaked cluster X refcount=N reference=M`
Additionally, compressed data can span multiple physical clusters,
not just one. The compressed cluster address and size are encoded
in the L2 entry, and the data may cross cluster boundaries.
The proper handling is implemented as follows:
- Extract compressed cluster address and size before overwriting
L2 entry
- Identify all clusters occupied by the compressed data
- Decrement refcount for each cluster in the range
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
When a refcount block is evicted from cache and replaced with a new
one, the old refcount block cluster was added to unref_clusters but
its refcount was never decremented to 0 on disk. This left the cluster
with refcount=1 while no metadata referenced it, causing errors in
qemu-img check
`Leaked cluster X refcount=1 reference=0`
This fix recursively calls set_cluster_refcount(freed_cluster, 0) to
properly decrement the freed refcount block's refcount on disk. The
recursion handles cascading replacements where freeing one refcount
block may trigger the replacement of another.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
"Signed-off-by" is the only variant that is accepted. So we
should remove the inconsistency to prevent:
- user forgets this at all
- CI complains
- user adds "Signed-Off-By"
- CI still complains because of the wrong format
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
We enable AMX tile state components via the hypervisor (as introduced
in the previous commit) instead of doing this inline in the body of
`CpuManager::new`.
Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
Currently when the user configures AMX the corresponding state
components get dynamically enabled directly inside the function body of
vmm::cpu::CpuManager::new.
With our ongoing work on CPU templates/profiles, there will (likely) be
one more binary crate for producing CPU profiles that also needs to do
this (without creating a CpuManager) and it may also be the case that
we will need to call this function prior to `CpuManager::new` during
live migrations.
We thus add a method for enabling the AMX tile state components on the
hypervisor trait that may be called wherever necessary. We argue that
this is beneficial for code clarity independently of the upcoming CPU
templates/profiles PR that we are working on.
The astute reader will notice that the logic introduced here is not 100%
the same as what is done inside the vmm::cpu::Cpumanager::new method. We
claim that our approach is more in-line with the official documentation.
Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
This patch updates the documentation to reflect the newly added
nested CPU feature option in the CLI.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Add an option in the CLI to enable nested support.
Adding an option --cpu nested=on|off to the CPU
argument to enable nested support for Microsoft
Hypervisor. By default it is enabled for both KVM
and MSHV.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Create HypervisorVmConfig early and pass the
struct to VM creation API in the vmm crate. Getting
rid of multiple conditional parameter.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
The OFLAG_COPIED bit (bit 63) indicates a cluster's refcount is exactly
1 and doesn't need copy-on-write. This bit must be set in L1 entries
when their referenced L2 clusters have refcount=1.
Previously, L1 entries were always written as raw addresses without the
OFLAG_COPIED bit, violating the QCOW2 specification and causing qemu-img
check to report errors like
`ERROR OFLAG_COPIED L2 cluster: l1_index=X .... refcount=1`
The implementation queries each L2 cluster's refcount in sync_caches()
and sets OFLAG_COPIED appropriately when writing the L1 table. This
ensures QCOW2 images are specification compliant and maintain correct
COW semantics to avoid data corruption.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
Previously the code relies on the implicit flush when BufWriter is
dropped. That's not safe.
Per BufWriter's document:
```
It is critical to call flush before BufWriter<W> is dropped. Though
dropping will attempt to flush the contents of the buffer, any errors
that happen in the process of dropping will be ignored. Calling flush
ensures that the buffer is empty and thus dropping will not even attempt
file operations.
```
Signed-off-by: Wei Liu <liuwe@microsoft.com>
The MSRs are constant at boot time so rather than creating a vector in
the boot_msr_entries() method instead reaturn a reference to static MSR
array data.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
Reflect the latest migration protocol as mermaid diagrams in the
(code) documentation.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Previously, state that we accumulated during the migration process in
the receiver was kept in `mut Option` variables or HashMaps. The
problem is that it is unclear in the code when these variables can be
safely used. It's also difficult to add new state, such as the state
we need to handle the upcoming feature for performing the migration
using multiple connections.
To solve this, I've refactored the code to use the state machine
pattern. Each state carries the data it needs to. Any state that
didn't arrive yet (memory_files, memory_manager) cannot be accessed
until we are in the proper state.
Some benefits that fall out of this:
- We now respond to all requests, even invalid ones, at least with an
error message.
- Any error handling a request will result in an error message being
sent.
- There is only a single place where responses are sent and thus it's
very hard to mess this up in the code.
- The main protocol state machine fits on a screen.
I would argue that especially the error cases are now much more
consistent. There is still a lot to be done. There is still state
transfer via self.vm and similar. In an ideal world, this would also
be carried by the state machine. I also want to see better handling of
payloads, which are still handled all over the place, but this change
is already too big. :)
Co-authored-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP julian.stecklina@sap.com
On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
... and nuke some Option<> while I was there. Given that HashMap has a
usable default and we end up passing an empty HashMap anyway, just get
rid of the Option.
On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
This will be useful later when we rebuild the live migration code as a
state machine.
On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
During the lifecycle of a confidential VM, the guest may revoke
previously shared pages via an attribute-intercept VM exit.
When this happens, the host-side cache must be updated so that any
subsequent access by the VMM triggers a fresh request for the guest
to re-share the page.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
`impl AsRef<Path>` is the most idiomatic way to consume paths
in Rust. I removed the mixture.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This is a follow-up of [0].
# Advantages
- This saves dozens of unneeded clone()s across the whole code base
- Makes it much easier to reason about how parameters are used
(often we passed owned Arc/Rc versions without actually needing
ownership)
# Exceptions
For certain code paths, the alternatives would require awkward or overly
complex code, and in some cases the functions are the logical owners of
the values they take. In those cases, I've added
#[allow(clippy::needless_pass_by_value)].
This does not mean that one should not improve this in the future.
[0] 6a86c157af
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Fix the condition to correctly identify a new PTY connection.
Fixes: 287887c99 (vmm: fix console IO safety)
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Previously, `into_raw_fd()` transferred fd ownership to epoll while
keeping separate clones in `reader` and `writer`, causing leaks when
the stream was closed. Now, `reader` owns the fd and epoll borrows it.
When the FD is closed in the kernel, reader will be reset to None.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
The Intel Granite Rapids processors include more AMX related features
that are advertised in leaf 0x7 subleaf 0x1. If the VM is not
configured to support AMX (the default) then these feature bits need
to be masked out.
Furthermore Tile information and TMUL information
in leaves 0x1d and 0x1e respectively are also purely related to AMX
and should also be zeroed whenever AMX support is disabled.
Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP <oliver.anderson@sap.com>
When enabling amx feature, we should call arch_prctl to request
permission to use tile data for guest. The permission should be
requested before the first vcpu is created, so we need to call
arch_prctl in vmm thread. This patch adds the arch_prctl syscall for
vmm_thread_rules.
Fixes: #7516
Signed-off-by: Songqian Li <sionli@tencent.com>
This helps to uncover expensive and needless clones in the code base.
For example, I prevented extensive clones in the snapshot path where
(nested) BTreeMap's have been cloned over and over again. Further,
the lint helps devs to much better reason about the ownership of
parameters.
All of these changes have been done manually with the necessary
caution. A few structs that are cheap to clone are now `copy` so that
this lint won't trigger for them.
I didn't enable the lint so far as it is a massive rabbit hole and
needs much more fixes. Nevertheless, it is very useful.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This removes cognitive load when reading if statements.
All changes were applied by clippy via `--fix`.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
We've encountered issues that Cloud Hypervisor corrupts disk images
after use. Those issues may not be immediately obvious until the
corrupted images are used again.
Run consistency checks over the disk images in the test cases to catch
issues as early as possible.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
It is a common use case to run a subset of tests locally to verify
certain functionalities.
The default behaviour for nextest is to error out if no tests are run.
That causes the test scripts to return a non-zero value (failure). Pass
`--no-tests=pass` to nextest to match what `cargo test` does if no tests
are run.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
This is continuing the work from [0]. Some places in CHV
"misuse" Result::map_err for side-effects. inspect_err is
a better suited alternative for exactly that use-case.
Unfortunately, there is no clippy lint for this yet.
[0] f02745a7ed
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This commit is part of a series of similar commits.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This commit is part of a series of similar commits.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This commit is part of a series of similar commits.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This commit is part of a series of similar commits.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This commit is part of a series of similar commits.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This commit is part of a series of similar commits.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com