diff --git a/docs/superpowers/specs/2026-03-25-fuzz-corpus-seeding-design.md b/docs/superpowers/specs/2026-03-25-fuzz-corpus-seeding-design.md new file mode 100644 index 0000000..e1e929c --- /dev/null +++ b/docs/superpowers/specs/2026-03-25-fuzz-corpus-seeding-design.md @@ -0,0 +1,96 @@ +# Fuzz Corpus Seeding Design + +## Goal + +Seed the fuzzing corpus with valid USB/IP traffic to help the fuzzer explore interesting code paths faster. Currently the corpora contain only fuzzer-discovered inputs — mostly small, partially-valid byte sequences. Starting from realistic, well-formed inputs lets mutation-based fuzzing reach deeper program states sooner. + +## Approach + +A standalone Rust binary (`gen_corpus`) in `lib/fuzz/` that programmatically constructs valid USB/IP packets using the library's own serialization code and writes them as seed files to the corpus directories. + +## Seed Categories + +### 1. `fuzz_parse_command` — Negotiation Phase + +Two seeds covering the only valid client commands: + +| Seed | Description | Size | +|------|-------------|------| +| `seed-devlist` | OP_REQ_DEVLIST: version 0x0111, command 0x8005, status 0 | 8 bytes | +| `seed-import` | OP_REQ_IMPORT: version 0x0111, command 0x8003, status 0, busid "1-1\0..." | 40 bytes | + +### 2. `fuzz_urb_hid` / `fuzz_urb_cdc` / `fuzz_urb_uac` — URB Phase + +Each seed is a concatenated sequence of CMD_SUBMIT (and optionally CMD_UNLINK) packets. Version field is 0x0000 for URB-phase commands. All multi-byte fields are big-endian. + +#### Shared enumeration sequence (all three targets) + +A single seed containing these CMD_SUBMITs back to back: + +1. GET_DESCRIPTOR(Device, 64) — control IN, ep 0, setup `80 06 00 01 00 00 40 00` +2. GET_DESCRIPTOR(Device, 18) — control IN, ep 0, setup `80 06 00 01 00 00 12 00` +3. GET_DESCRIPTOR(Config, 9) — control IN, ep 0, setup `80 06 00 02 00 00 09 00` +4. GET_DESCRIPTOR(Config, 255) — control IN, ep 0, setup `80 06 00 02 00 00 ff 00` +5. SET_CONFIGURATION(1) — control OUT, ep 0, setup `00 09 01 00 00 00 00 00` + +#### HID-specific seeds + +- GET_DESCRIPTOR(HID Report) on ep 0: setup `81 06 00 22 00 00 80 00` (128 bytes) +- SET_IDLE: setup `21 0a 00 00 00 00 00 00` +- GET_REPORT: setup `a1 01 00 01 00 00 08 00` (8 bytes — standard keyboard report size) +- Interrupt IN on ep 0x81 (the HID keyboard's interrupt endpoint), 8-byte transfer + +#### CDC-specific seeds + +- SET_LINE_CODING: class-specific OUT with 7-byte payload (baud, stop bits, parity, data bits) +- SET_CONTROL_LINE_STATE: setup `21 22 03 00 00 00 00 00` (DTR + RTS active) +- Bulk OUT data transfer on the CDC data endpoint + +#### UAC-specific seeds + +- SET_CUR / GET_CUR for sample rate control +- Isochronous IN/OUT transfers with valid ISO packet descriptors (`number_of_packets > 0`, each descriptor 16 bytes with offset + length within transfer buffer bounds) + +#### Edge-case seeds (all URB targets) + +- CMD_UNLINK referencing a previous seqnum +- Zero-length control transfer +- Interrupt/bulk transfers on non-zero endpoints + +### 3. `fuzz_handle_client` — Full Connection + +Concatenation of negotiation + URB phases: + +| Seed | Description | +|------|-------------| +| `seed-devlist-only` | OP_REQ_DEVLIST alone (early-exit path) | +| `seed-import-enumerate` | OP_REQ_IMPORT + shared enumeration sequence | +| `seed-import-hid-full` | OP_REQ_IMPORT + enumeration + HID-specific requests | + +This target uses a HID keyboard, so only HID-specific seeds are needed. + +## Binary Structure + +- **Location:** `lib/fuzz/gen_corpus.rs` as a `[[bin]]` target in `lib/fuzz/Cargo.toml` +- **Dependencies:** Only `usbip-rs` (the lib crate) — no new external dependencies +- **Output:** Writes raw binary files to `lib/fuzz/corpus//seed-` +- **Naming:** `seed-*` prefix distinguishes hand-crafted seeds from fuzzer-discovered ones +- **Idempotent:** Running twice overwrites the same files, no duplication +- **Logging:** Prints each generated file path and size + +Uses `build_cmd_submit_bytes()` and the protocol types from the library for serialization. + +## Nix Integration + +A new nix app `gen-fuzz-corpus` alongside the existing `fuzz-usbip` and `fuzz-clean-usbip`: + +```bash +nix run .#gen-fuzz-corpus +``` + +## Scope Boundaries + +- Does **not** delete existing corpus entries — only adds `seed-*` files +- Does **not** capture live USB traffic — all seeds constructed programmatically +- Does **not** generate invalid/malformed seeds — mutation is the fuzzer's job +- Does **not** change existing fuzz targets or validation logic