fix: close TOCTOU race in UNLINK handling that crashes vhci_hcd

The spawn_blocking task checked is_cancelled() before removing the URB
from in_flight, allowing the UNLINK handler to find the entry, cancel
the token, and send RET_UNLINK(status=0) in the gap — while the task
proceeded to also send RET_SUBMIT. Both responses for the same URB is
a fatal protocol violation: if the kernel receives RET_UNLINK first it
gives back the URB, then RET_SUBMIT can't find it and disconnects.

Fix by removing from in_flight before checking is_cancelled(), making
the UNLINK handler and completion path mutually exclusive on the map.

Also downgrade "not found in-flight" to trace — it's a normal race for
isochronous transfers, not an error.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Davíð Steinn Geirsson 2026-03-25 12:35:38 +00:00
parent c56dced930
commit 80a9f35e39

View file

@ -675,13 +675,26 @@ pub async fn handle_urb_loop<T: AsyncReadExt + AsyncWriteExt + Unpin + Send + 's
}
};
// Remove from in-flight BEFORE checking cancellation to close
// a TOCTOU race with the UNLINK handler. The UNLINK handler
// only sends RET_UNLINK(status=0) when it finds the entry in
// in_flight; by removing first we guarantee either:
// (a) UNLINK already found + cancelled us → is_cancelled()
// returns true here, we suppress our RET_SUBMIT (kernel
// will use the RET_UNLINK to give back the URB), or
// (b) UNLINK won't find us → it sends RET_UNLINK(-ENOENT),
// and our RET_SUBMIT completes the URB normally (the
// kernel handles both correctly).
// Without this ordering, the UNLINK handler can cancel + send
// RET_UNLINK(status=0) between our is_cancelled() check and
// the remove, causing both RET_UNLINK and RET_SUBMIT to be
// sent — a fatal protocol violation that crashes vhci_hcd.
in_flight_ref.lock().unwrap_or_else(|e| e.into_inner()).remove(&seqnum);
if cancel.is_cancelled() {
in_flight_ref.lock().unwrap_or_else(|e| e.into_inner()).remove(&seqnum);
return;
}
in_flight_ref.lock().unwrap_or_else(|e| e.into_inner()).remove(&seqnum);
match res.to_bytes() {
Ok(bytes) => {
if tx.blocking_send(bytes).is_err() {
@ -735,7 +748,7 @@ pub async fn handle_urb_loop<T: AsyncReadExt + AsyncWriteExt + Unpin + Send + 's
let status = if cancelled {
0u32
} else {
warn!("UNLINK: seqnum={unlink_seqnum} not found in-flight");
trace!("UNLINK: seqnum={unlink_seqnum} not found in-flight (already completed)");
(-2i32) as u32
};