docs: add fuzz corpus seeding design spec

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Davíð Steinn Geirsson 2026-03-25 23:57:14 +00:00
parent dc58a36c9b
commit 22be4dd064

View file

@ -0,0 +1,96 @@
# Fuzz Corpus Seeding Design
## Goal
Seed the fuzzing corpus with valid USB/IP traffic to help the fuzzer explore interesting code paths faster. Currently the corpora contain only fuzzer-discovered inputs — mostly small, partially-valid byte sequences. Starting from realistic, well-formed inputs lets mutation-based fuzzing reach deeper program states sooner.
## Approach
A standalone Rust binary (`gen_corpus`) in `lib/fuzz/` that programmatically constructs valid USB/IP packets using the library's own serialization code and writes them as seed files to the corpus directories.
## Seed Categories
### 1. `fuzz_parse_command` — Negotiation Phase
Two seeds covering the only valid client commands:
| Seed | Description | Size |
|------|-------------|------|
| `seed-devlist` | OP_REQ_DEVLIST: version 0x0111, command 0x8005, status 0 | 8 bytes |
| `seed-import` | OP_REQ_IMPORT: version 0x0111, command 0x8003, status 0, busid "1-1\0..." | 40 bytes |
### 2. `fuzz_urb_hid` / `fuzz_urb_cdc` / `fuzz_urb_uac` — URB Phase
Each seed is a concatenated sequence of CMD_SUBMIT (and optionally CMD_UNLINK) packets. Version field is 0x0000 for URB-phase commands. All multi-byte fields are big-endian.
#### Shared enumeration sequence (all three targets)
A single seed containing these CMD_SUBMITs back to back:
1. GET_DESCRIPTOR(Device, 64) — control IN, ep 0, setup `80 06 00 01 00 00 40 00`
2. GET_DESCRIPTOR(Device, 18) — control IN, ep 0, setup `80 06 00 01 00 00 12 00`
3. GET_DESCRIPTOR(Config, 9) — control IN, ep 0, setup `80 06 00 02 00 00 09 00`
4. GET_DESCRIPTOR(Config, 255) — control IN, ep 0, setup `80 06 00 02 00 00 ff 00`
5. SET_CONFIGURATION(1) — control OUT, ep 0, setup `00 09 01 00 00 00 00 00`
#### HID-specific seeds
- GET_DESCRIPTOR(HID Report) on ep 0: setup `81 06 00 22 00 00 80 00` (128 bytes)
- SET_IDLE: setup `21 0a 00 00 00 00 00 00`
- GET_REPORT: setup `a1 01 00 01 00 00 08 00` (8 bytes — standard keyboard report size)
- Interrupt IN on ep 0x81 (the HID keyboard's interrupt endpoint), 8-byte transfer
#### CDC-specific seeds
- SET_LINE_CODING: class-specific OUT with 7-byte payload (baud, stop bits, parity, data bits)
- SET_CONTROL_LINE_STATE: setup `21 22 03 00 00 00 00 00` (DTR + RTS active)
- Bulk OUT data transfer on the CDC data endpoint
#### UAC-specific seeds
- SET_CUR / GET_CUR for sample rate control
- Isochronous IN/OUT transfers with valid ISO packet descriptors (`number_of_packets > 0`, each descriptor 16 bytes with offset + length within transfer buffer bounds)
#### Edge-case seeds (all URB targets)
- CMD_UNLINK referencing a previous seqnum
- Zero-length control transfer
- Interrupt/bulk transfers on non-zero endpoints
### 3. `fuzz_handle_client` — Full Connection
Concatenation of negotiation + URB phases:
| Seed | Description |
|------|-------------|
| `seed-devlist-only` | OP_REQ_DEVLIST alone (early-exit path) |
| `seed-import-enumerate` | OP_REQ_IMPORT + shared enumeration sequence |
| `seed-import-hid-full` | OP_REQ_IMPORT + enumeration + HID-specific requests |
This target uses a HID keyboard, so only HID-specific seeds are needed.
## Binary Structure
- **Location:** `lib/fuzz/gen_corpus.rs` as a `[[bin]]` target in `lib/fuzz/Cargo.toml`
- **Dependencies:** Only `usbip-rs` (the lib crate) — no new external dependencies
- **Output:** Writes raw binary files to `lib/fuzz/corpus/<target>/seed-<name>`
- **Naming:** `seed-*` prefix distinguishes hand-crafted seeds from fuzzer-discovered ones
- **Idempotent:** Running twice overwrites the same files, no duplication
- **Logging:** Prints each generated file path and size
Uses `build_cmd_submit_bytes()` and the protocol types from the library for serialization.
## Nix Integration
A new nix app `gen-fuzz-corpus` alongside the existing `fuzz-usbip` and `fuzz-clean-usbip`:
```bash
nix run .#gen-fuzz-corpus
```
## Scope Boundaries
- Does **not** delete existing corpus entries — only adds `seed-*` files
- Does **not** capture live USB traffic — all seeds constructed programmatically
- Does **not** generate invalid/malformed seeds — mutation is the fuzzer's job
- Does **not** change existing fuzz targets or validation logic