Commit graph

686 commits

Author SHA1 Message Date
Saravanan D
dc0c306dd9 vmm: Add ACPI Generic Initiator support
Support ACPI Generic Initiator Affinity to associate
PCI devices with NUMA proximity domains

Add GenericInitiatorAffinity struct

Add from_pci_bdf() to encode PCI Segment:Bus:Device.Function

Add from_acpi_device() for ACPI device handles (future use)

Generate SRAT Type 5 entries for nodes with device_id

Improve create_slit_table() to check distance symmetry when
forward distance is missing

Track device ID to BDF mappings in DeviceManager

Includes comprehensive unit tests

Signed-off-by: Saravanan D <saravanand@crusoe.ai>
2026-02-12 22:54:54 +00:00
Zhibin Li
28686bba46 vmm: fix rsdp_addr assertion for TDX
TDX builds its own ACPI tables in `create_acpi_tables_tdx` so it will
return None in the standard `create_acpi_tables` function and the
assertion for `rsdp_addr` will fail.

Signed-off-by: Zhibin Li <banlu.lzb@antgroup.com>
2026-01-29 16:23:22 +00:00
Muminul Islam
c9cd82b52b vmm: fix CVM boot failure on MSHV
Recent changes related to arm64 support in MSHV exposed
inconsistencies in the VM initialization and CVM boot paths.
The VM creation flow currently diverges across multiple scenarios,
including regular MSHV, CVM, and arm64, with each path performing
guest initialization steps in a different order.
Certain platform-specific requirements further constrain the ordering
of operations, such as the timing of address space creation,
IGVM loading, interrupt controller setup, and payload loading. For
CVM case address-space creation must be done after IGVM loading, and
PSP measurement. For Regular and arm64 this memory initialization
must be done early. For MSHV, vm.init() and sev_snp.init() are called in
different order which is run time and build time conditionally checked.

Additionally, while the KVM initialization path differs slightly
from MSHV, it shares common logic that is currently split across
separate conditional and build-time code paths, contributing to
fragmentation of the overall flow.

This change restructures the VM creation and initialization sequence
to better align shared logic, enforce scenario-specific ordering
constraints, and ensure consistent and correct behavior across all
supported configurations. In doing so, it restores proper CVM boot
behavior and improves the maintainability of the initialization code.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2026-01-21 19:40:17 +00:00
Anatol Belski
3657db7843 vmm: mshv: Set PROCESSORS_PER_SOCKET property for CPU topologies
On MSHV, exposing multithreaded CPU topologies requires setting the
PROCESSORS_PER_SOCKET partition property so that CPUID.0xB reports
correct logical processor counts and topology levels to the guest.

This property must be set after all vCPUs are configured, as the
hypervisor uses the complete vCPU layout to derive and report CPU
topology information.

Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
2026-01-05 21:41:28 +00:00
Thomas Prescher
37d71fa038 vmm: disk resize infrastructure
Add basic infrastructure so resize events are
propagated to the underlying disk implementation.

On-behalf-of: SAP thomas.prescher@sap.com
Signed-off-by: Thomas Prescher <thomas.prescher@cyberus-technology.de>
2025-12-17 13:54:52 +00:00
Philipp Schuster
6bda6541be vmm: cleanup &Mutex parameters
In [0] we refactored some Arc<Mutex<T>> parameters to &Mutex<T>> to
satisfy clippy's needless_pass_by_value lint. Nevertheless, this is also
not so idiomatic, so as a follow-up, we put the responsibility to lock
objects to the caller side (only where this is not strictly needed by
the callee).

While on it, I also tried to pass vm_config directly into
pre_create_console_devices() which would clean up some code, but then
we have interleaving mutable and immutable borrows of the Vmm, which
are denied by the borrow checker.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-12-10 11:06:29 +00:00
Muminul Islam
deaf660a52 vmm: simplify VM creation API
Create HypervisorVmConfig early and pass the
struct to VM creation API in the vmm crate. Getting
rid of multiple conditional parameter.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2025-12-03 09:56:58 +00:00
Julian Stecklina
1861bc49e7 vmm: simplify receiving memory fds
... and nuke some Option<> while I was there. Given that HashMap has a
usable default and we end up passing an empty HashMap anyway, just get
rid of the Option.

On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
2025-11-28 08:41:47 +00:00
Philipp Schuster
c53781bf5f misc: clippy: add needless_pass_by_value
This is a follow-up of [0].

# Advantages

- This saves dozens of unneeded clone()s across the whole code base
- Makes it much easier to reason about how parameters are used
  (often we passed owned Arc/Rc versions without actually needing
  ownership)

# Exceptions

For certain code paths, the alternatives would require awkward or overly
complex code, and in some cases the functions are the logical owners of
the values they take. In those cases, I've added
#[allow(clippy::needless_pass_by_value)].

This does not mean that one should not improve this in the future.

[0] 6a86c157af

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-27 17:11:14 +00:00
Philipp Schuster
6a86c157af misc: clippy: add needless_pass_by_value (partially)
This helps to uncover expensive and needless clones in the code base.
For example, I prevented extensive clones in the snapshot path where
(nested) BTreeMap's have been cloned over and over again. Further,
the lint helps devs to much better reason about the ownership of
parameters.

All of these changes have been done manually with the necessary
caution. A few structs that are cheap to clone are now `copy` so that
this lint won't trigger for them.

I didn't enable the lint so far as it is a massive rabbit hole and
needs much more fixes. Nevertheless, it is very useful.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-25 16:05:46 +00:00
Philipp Schuster
a0b72dce22 misc: clippy: add redundant_else
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-25 16:05:46 +00:00
Philipp Schuster
d2b19bb969 misc: clippy: add map_unwrap_or
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-25 16:05:46 +00:00
Philipp Schuster
67fc9d990e misc: vmm: drop extern crate, use modern rust
This commit is part of a series of similar commits.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-24 22:36:46 +00:00
Demi Marie Obenour
969a3b57a3 misc: tdx: make tdx_init_memory_region() unsafe
It takes a pointer to a userspace address that it accesses, so it should
be marked unsafe.  This was missed earlier.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
2025-11-22 10:24:13 +00:00
Demi Marie Obenour
199d2d05d8 hypervisor: tdx: do not use u64 to represent pointers
Also drop support for building the TDX code for 32-bit targets.  All
CPUs with TDX support are 64-bit so supporting 32-bit targets is not
needed.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
2025-11-22 10:24:13 +00:00
Demi Marie Obenour
42522a88c0 misc: do not use u64 to represent host pointers
To ensure that struct sizes are the same on 32-bit and 64-bit, various
kernel APIs use __u64 (Rust u64) to represent userspace pointers.
Userspace is expected to cast pointers to __u64 before passing them to
the kernel, and cast kernel-provided __u64 to a pointer before using
them.  However, various safe APIs in Cloud Hypervisor took
caller-provided u64 values and passed them to syscalls that interpret
them as userspace addresses.  Therefore, passing bad u64 values would
cause memory disclosure or corruption.

Fix the bug by using usize and pointer types as appropriate.  To make
soundness of the code easier to reason about, the PCI code gains a new
MmapRegion abstraction that ensures the validity of pointers.  The rest
of the code already has an MmapRegion abstraction it can use.  To avoid
having to reason about whether something is keeping the MmapRegion
alive, reference counting is added.  MmapRegion cannot hold references
to other objects, so the reference counting cannot introduce cycles.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
2025-11-22 10:24:13 +00:00
Demi Marie Obenour
fdc19ad85e misc: Mark memory region APIs as unsafe
To ensure that struct sizes are the same on 32-bit and 64-bit, various
kernel APIs use __u64 (Rust u64) to represent userspace pointers.
Userspace is expected to cast pointers to __u64 before passing them to
the kernel, and cast kernel-provided __u64 to a pointer before using
them.  However, various safe APIs in Cloud Hypervisor took
caller-provided u64 values and passed them to syscalls that treat them
as userspace addresses.  Therefore, passing bad u64 values would cause
memory disclosure or corruption.  The memory region APIs are one example
of this, so mark them as unsafe.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
2025-11-22 10:24:13 +00:00
Philipp Schuster
b4c62bf159 misc: clippy: add semicolon_if_nothing_returned
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-21 09:32:11 +00:00
Philipp Schuster
ea4f07d3bf misc: clippy: add uninlined_format_args
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-21 09:32:11 +00:00
Philipp Schuster
7cb73e9e56 misc: clippy: add unnecessary_semicolon
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-21 09:32:11 +00:00
Philipp Schuster
7364fbdc8e tests: move VM test into a test module
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-20 21:15:03 +00:00
Philipp Schuster
d1680b9ff9 tests: streamline module names to unit_tests
This better aligns with the rest of the code and makes it clearer
that these tests can run "as is" in a normal hosted environments
without the special test environment.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-11-20 21:15:03 +00:00
Philipp Schuster
7536a95424 misc: cleanup &Arc<dyn T> -> &dyn T
Consuming `&Arc<T>` as argument is almost always an antipattern as it
hides whether the callee is going to take over (shared) ownership
(by .clone()) or not. Instead, it is better to consume `&dyn T` or
`Arc<dyn T>` to be more explicit. This commit cleans up the code.

The change is very mechanic and was very easy to implement across the
code base.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-10-28 17:37:49 +00:00
Rob Bradford
cb5aaca809 hypervisor, vmm: Remove inner Mutex protecting VcpuFd
This was added in 7be69edf51 to deal with
changes to the KVM bindings that made run() and set_immediate_exit()
take &mut self. Instead adopt a Box<> value in Vcpu allowing the removal
of this internal Mutex.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2025-10-24 13:13:12 +00:00
Ruoqing He
f2dfa7f6e0 misc: Use variables directly in format! string
Fix clippy warning `uninlined_format_args` reported by rustc rustc
1.89.0 (29483883e 2025-08-04).

```console
warning: variables can be used directly in the `format!` string
   --> block/src/lib.rs:649:17
    |
649 |                 info!("{} failed to create io_uring instance: {}", error_msg, e);
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
    = note: `#[warn(clippy::uninlined_format_args)]` on by default
help: change this to
    |
649 -                 info!("{} failed to create io_uring instance: {}", error_msg, e);
649 +                 info!("{error_msg} failed to create io_uring instance: {e}");
    |
```

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-09-24 02:28:12 +00:00
Muminul Islam
1e8996f94f vmm: hypervisor: simplify VM creation API
For MSHV customers don't want to make everything
default during partition creation. For example
nested support, some synthetic features could be
controlled from CLI through platform argument.
Create_vm API getting messy after adding more flags.
This patch introduces common data struct to be passed
from vmm crate to hypervisor crate during partition creation.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2025-09-17 16:40:10 +00:00
Philipp Schuster
1179a1a1c9 vmm: refactor alignment
Context [0].

[0] https://github.com/cloud-hypervisor/cloud-hypervisor/pull/7256#discussion_r2298538384

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-09-10 18:35:38 +00:00
Philipp Schuster
c995b72384 build: treewide: clippy: collapse nested ifs, use let chains
This bumps the MSRV to 1.88 (also, Rust edition 2024 is mandatory).

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-09-10 18:35:38 +00:00
Philipp Schuster
363273111a build: treewide: fmt for edition 2024
`cargo +nightly fmt`

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-09-10 18:35:38 +00:00
Peter Oskolkov
05d222f0eb vmm: raise the (v)CPU limit on kvm/x86_64
Raise the max number of supported (v)CPUs on kvm x86_64 hosts
to 8192 (the max allowed value of CONFIG_NR_CPUS in the Linux kernel).

Other platfroms keep their existing CPU limits pending further
development and testing.

The change has been tested on Intel and AMD hosts.

Signed-off-by: Barret Rhoden <brho@google.com>
Signed-off-by: Neel Natu <neelnatu@google.com>
Signed-off-by: Ofir Weisse <oweisse@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
2025-09-08 22:54:31 +00:00
Shubham Chakrawar
2d9e243163 misc: Remove SGX support from Cloud Hypervisor
This commit removes the SGX support from cloud hypervisor. SGX support
was deprecated in May as part of #7090.

Signed-off-by: Shubham Chakrawar <schakrawar@crusoe.ai>
2025-09-05 18:08:36 +00:00
Philipp Schuster
2c6426460e vmm: harmonize bootpath across architectures
On aarch64 and RISC-V, calling load_firmware() through load_kernel()
provides no benefit and only duplicates checks already performed in
load_payload(). load_payload() now directly invokes load_firmware() or
load_kernel(), removing unnecessary indirection and redundancy.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-08-22 16:41:29 +00:00
Bo Chen
1a63b4b2ff vmm: Consolidate 'load_firmware/kernel' for aarch64 and riscv
Both functions are defined separately for the two architecture with
minor differences.

* `load_firmware()`: call `arch::uefi::load_uefi` which are available on
both architecture;
* `load_kernel()`: manually align to `arch::layout::KERNEL_START` 2MB
for both architecture (e.g. no-op for `aarch64`);

Signed-off-by: Bo Chen <bchen@crusoe.ai>
2025-08-21 15:32:05 +00:00
Philipp Schuster
dd8687aebb vmm: add enum PayloadConfigError validation to improve error reporting
Currently, the following scenarios are supported by Cloud Hypervisor to
bootstrap a VM:

1. provide firmware
2. provide kernel
3. provide kernel + cmdline
4. provide kernel + initrd
5. provide kernel + cmdline + initrd

As the difference between `--firmware` and `--kernel` is not very clear
currently, especially as both use/support a Xen PVH entry, adding this
helps to identify the cause of misconfiguration.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-08-15 17:08:37 +00:00
Peter Oskolkov
6e0403a959 misc: make topology a 4-tuple of u16s
This is the second patch in a series intended to let Cloud Hypervisor
support more than 255 vCPUs in guest VMs; the first patch/commit is
https://github.com/cloud-hypervisor/cloud-hypervisor/pull/7231

At the moment, CPU topology in Cloud Hypervisor is using
u8 for components, and somewhat inconsistently:

- struct CpuTopology in vmm/src/vm_config.rs uses four components
  (threads_per_core, cores_per_die, dies_per_package, packages);

- when passed around as a tuple, it is a 3-tuple of u8, with
  some inconsistency:

- in get_x2apic_id in arch/src/x86_64/mod.rs  the three u8
    are assumed to be (correctly)
    threads_per_core, cores_per_die, and dies_per_package, but

- in get_vcpu_topology() in vmm/src/cpu.rs the three-tuple is
    threads_per_core, cores_per_die, and packages (dies_per_package
    is assumed to always be one? not clear).

So for consistency, a 4-tuple is always passed around.

In addition, the types of the tuple components is changed from u8 to
u16, as on x86_64 subcomponents can consume up to 16 bits.

Again, config constraints have not been changed, so this patch
is mostly NOOP.

Signed-off-by: Barret Rhoden <brho@google.com>
Signed-off-by: Neel Natu <neelnatu@google.com>
Signed-off-by: Ofir Weisse <oweisse@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
2025-08-13 07:31:22 +00:00
Peter Oskolkov
aa8e9cd91a misc: Change cpu ID type from u8 to u32
This is the first change to Cloud Hypervisor in a series of changes
intended to increase the max number of supported vCPUs in guest VMs,
which is currently limited to 255 (254 on x86_64).

No user-visible/behavior changes are expected as a result of
applying this patch, as the type of boot_cpus and related
fields in config structs remains u8 for now, and all configuration
validations remain the same.

Signed-off-by: Barret Rhoden <brho@google.com>
Signed-off-by: Neel Natu <neelnatu@google.com>
Signed-off-by: Ofir Weisse <oweisse@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
2025-08-11 20:31:50 +00:00
Alex Orozco
a70c1b38e7 devices: Add fw_cfg cli options
This allows us to enable/disable the fw_cfg device via the cli

We can also now upload files into the guest vm using fw_cfg_items
via the cli

Signed-off-by: Alex Orozco <alexorozco@google.com>
2025-08-11 17:29:51 +00:00
Alex Orozco
1f51e4525b devices: Add acpi tables to fw_cfg
The acpi tables are created in the same place the acpi tables would be
created for the regular bootflow, except here we add them to the
fw_cfg device to be measured by the fw and then the fw will put the
acpi tables into memory.

Signed-off-by: Alex Orozco <alexorozco@google.com>
2025-08-11 17:29:51 +00:00
Alex Orozco
f0b69d56d0 devices: Add e820/memory_map to fw_cfg device
We build the memory map in the fw_cfg device based on the memory size.

Signed-off-by: Alex Orozco <alexorozco@google.com>
2025-08-11 17:29:51 +00:00
Alex Orozco
623fadfa9d devices: Add kernel cmdline, kernel, and initramfs to fw_cfg device
The kernel and initramfs are passed to the fw_cfg device as
file references. The cmdline is passed directly.

Signed-off-by: Alex Orozco <alexorozco@google.com>
2025-08-11 17:29:51 +00:00
Alex Orozco
777b7ee11e devices: Add fw_cfg device
Here we add the fw_cfg device as a legacy device to the device manager.
It is guarded behind a fw_cfg flag in vmm at creation of the
DeviceManager. In this cl we implement the fw_cfg device with one
function (signature).

Signed-off-by: Alex Orozco <alexorozco@google.com>
2025-08-11 17:29:51 +00:00
Ruoqing He
17195e1a46 vmm: Enable firmware boot for riscv64
Implement firmware boot (UEFI boot) for riscv64 architecture.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-08-11 10:10:27 +00:00
Ruoqing He
0df4b1ac4f vmm: Define riscv64 UEFI Error
Error::UefiLoad is required for load_firmware to propagate errors
encountered, define it for riscv64.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-08-11 10:10:27 +00:00
Muminul Islam
8e010f1aa3 vmm: don't configure system if rsdp is not available
In case of CVM guest rsdp is set to none. Unwrapping it
make the vmm crashed. Don't call configure system if the
rsdb address is none.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2025-07-29 15:55:06 +00:00
Philipp Schuster
d7edd9d51f misc: vmm: streamline error Display::fmt()
The changes were mostly automatically applied using the following
Python script:

```python
import os, re

for root, _, files in os.walk("."):
    for f in files:
        if not f.endswith(".rs"):
            continue
        p = os.path.join(root, f)
        with open(p, "r", encoding="utf-8") as file:
            lines = file.readlines()
        changed = False
        for i in range(len(lines) - 1):
            if re.search(r'#\[error\(".*: \{0[^}]*\}"\)\]', lines[i]) and "#[source]" in lines[i + 1].strip():
                lines[i] = re.sub(r': \{0[^}]*\}"\)\]', '")]', lines[i])
                changed = True
        if changed:
            with open(p, "w", encoding="utf-8") as file:
                file.writelines(lines)
            print("Fixed:", p)
```

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com

# Conflicts:
#	vmm/src/api/http/mod.rs
2025-06-13 19:55:54 +00:00
Philipp Schuster
fff62d9302 misc: vmm: streamline #[source] and Error
This streamlines the code base to follow best practices for
error handling in Rust: Each error struct implements
std::error::Error (most due via thiserror::Error derive macro)
and sets its source accordingly.

This allows future work that nicely prints the error chains,
for example.

So far, the convention is that each error prints its
sub error as part of its Display::fmt() impl.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-05-21 09:09:30 +00:00
Philipp Schuster
a212343908 misc: arch/riscv64: streamline #[source] and Error
This streamlines the code base to follow best practices for
error handling in Rust: Each error struct implements
std::error::Error (most due via thiserror::Error derive macro)
and sets its source accordingly.

This allows future work that nicely prints the error chains,
for example.

So far, the convention is that each error prints its
sub error as part of its Display::fmt() impl.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-05-21 09:09:30 +00:00
Philipp Schuster
67793ca375 vmm: improve disk locking error message
This adds guidance on how to resolve the issue.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-05-19 15:13:01 +01:00
Philipp Schuster
05968f5c2c block: introduce advisory locks for disk image files
# What

This commit introduces file-based advisory locking for the files backing
up the block devices by using the fcntl() syscall with OFD locks. The
per-open-file-descriptor (OFD) locks are more robust than traditional
POSIX locks (F_SETLK) as they are not tied to process IDs and avoid
common issues in multithreaded or multi-fd scenarios [1]. Therefore,
we don't use `std::fs::File::try_lock()`, which is backed by F_SETLKW.

The locking mechanism is aware of the `readonly` property and allows
`n` readers or `1` writer (exclusive mode).

As the locks are advisory, multiple cloud-hypervisor processes can
prevent themselves from writing to the same file. However, this is not
a system-wide file-system level locking mechanism preventing to open()
a file.

The introduced new locking mechanism does not cover vhost-user devices.

# Why

To prevent misconfiguration and improve safety, it is good practice to
protect disk image files with a locking mechanism. Experience and common
best practices suggest that advisory locks are preferable over mandatory
locks due to better compatibility and fewer pitfalls (in fs space).

The introduced functionality is aligned with the approach taken by
QEMU [0], and is also recommended in [1].

# Implementation Details

We need to ensure that not only normal operation keeps working but also
state save/resume and live-migration. Especially for live migration,
it is crucial that the sender VMM releases the locks when the VM stops
so the receiver VMM can acquire them right after that.

Therefore, the locking and releasing happen directly on the block
device struct. The device manager knows all block devices and can
forward requests to these types.

Last but not least, this commit uses on explicit lock acquiring
but implicit lock releasing (FD close). It only explicitly releases
the locks where this integrates more smoothly into the existing
code.

# Testing

I tested
- normal operation
- state save/resume,
- device hot plugging,
- and live-migration
with read/shared and write/exclusive locks.

One can use the `fcntl-tool` to test if locks are actually acquired
or released [2].

# Links

[0] 825b96dbce/util/osdep.c (L266)
[1] https://apenwarr.ca/log/20101213
[2] https://crates.io/crates/fcntl-tool

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
2025-05-16 08:07:32 +00:00
Jinank Jain
034aa514d7 vmm: Unify address space allocation
It seems like address allocation has been spread into different files
and different location for x86 vs ARM. This makes it hard to follow the
code. Thus, unify it a single location which satisfies all the
requirement.

Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
2025-05-09 16:06:12 +00:00