If dxvk's opencoded fmulz gets partially constant folded,
it leaves this mess behind.
It's important to do this before the more general fmul+b2f patterns added
in the next commit, because they change the signed zero behavior in a way
that can't be optimized back.
Foz-DB Navi48:
Totals from 36 (0.03% of 114655) affected shaders:
Instrs: 16513 -> 15706 (-4.89%)
CodeSize: 99756 -> 95760 (-4.01%)
Latency: 45165 -> 44151 (-2.25%)
InvThroughput: 8344 -> 7886 (-5.49%)
VClause: 395 -> 401 (+1.52%)
Copies: 639 -> 634 (-0.78%)
PreSGPRs: 1158 -> 1154 (-0.35%)
PreVGPRs: 1227 -> 1225 (-0.16%)
VALU: 11310 -> 10769 (-4.78%)
SALU: 813 -> 809 (-0.49%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40399>
This means we respect the pattern order better because
simple replacements like bcsel(False, a, b) -> b no longer
insert movs that can block more specialized patterns.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40399>
Dual source blending when one of the sources is not written to leaves
those values undefined, but the other should still be valid.
By omitting unwritten outputs, we ended up not writing anything at all
for the case that OUT1 is written to but OUT0 is undefined.
Fixes new CTS tests: dEQP-VK.pipeline.*.blend.dual_source.undefined_output.first*
Cc: mesa-stable
Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40357>
It's valid to import memory with DEVICE_ADDRESS_CAPTURE_REPLAY_BIT
and we should allocate iova from the end of VMA heap, same as with ordinary
memory allocations. This is important for replaying such memory, when it
is being replayed without being imported.
Fixes replay errors with RenderDoc.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40378>
When there is no trace pointer, there is usually a another tracepoint
being emitted (see STATE_BASE_ADDRESS,
3DSTATE_BINDING_TABLE_POOL_ALLOC emission).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40503>
Saves about 2k text size.
Before:
text data bss dec hex filename
24817485 456164 27080 25300729 1820ef9 ./lib64/libvulkan_intel.so
After:
text data bss dec hex filename
24815381 456164 27080 25298625 18206c1 ./lib64/libvulkan_intel.so
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40230>
calc_blockdep always returned MAX_BLOCKDEP without checking if the
previous op writes to a buffer the current op reads from. This let
the NPU start reading before the previous write was done.
Add overlap check between previous OFM and current IFM so we set
blockdep to 0 when they share the same buffer.
Update ethos-imx93-fails.txt to remove the tests that now pass.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39594>
Replace the two functions simplified_elementwise_add_sub_scale and
eltwise_emit_ofm_scaling with a single advanced_elementwise_add_sub_scale
that follows the ethos-u-vela naming. Remove the large block of
commented out Vela Python code.
No functional change.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39594>
The upscale field was a bool which happened to work since true maps
to 1 which is NEAREST in the hardware. Change from bool to an enum
ethosu_upscale_mode so the intent is clear and we dont rely on the
bool-to-int mapping.
Also add a check in operation_supported so RESIZE only accepts 2x
upscaling since thats what the NPU can do with IFM_UPSCALE. Other
sizes fall back to CPU.
Keep the original zero_points from tensors in RESIZE and STRIDED_SLICE
instead of forcing them to 0 since the requantization needs them.
Fixes the RESIZE_NEAREST_NEIGHBOR operations in EfficientDet-Lite
models that use BiFPN with 2x nearest neighbor upsampling.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39594>
fill_weights subtracted a single zero_point from all weights which
did not handle models with per-channel zero_points. Use the
per-channel zero_point for each output channel when available.
Also decouple the zero_points copy from the scales copy in the lower
pass so they are handled independently.
Suggested-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39594>
For those models with coefficients that have different quantization
parameters for each channel.
The NPU can handle per-channel scales as can be seen in
fill_scale_and_biases(), which already iterates per output channel.
Activation tensors (input/output) don't have per-channel quantization.
- Add scales/zero_points arrays to ethosu_kernel struct
- Copy per-channel scales from weight tensor in lower pass
- Use per-channel scale when computing conv_scale in coefs
- Allow per-channel quantization in operation_supported check
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39594>
The old code would assert when a model has multiple scales but only
one zero_point. This is common for symmetric quantization where all
channels share the same zero_point (typically 0).
Handle this by replicating the single zero_point for all channels
instead of crashing.
Fixes MoveNet models using per-channel quantization.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39594>
Purely a visual change, but aligns with DDK disassembly.
For example:
- FMA.f32 r1, ^r1, u1, ^r4
+ FMA.f32 r1, r1^, u1, r4^
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40199>
A lot of masks and shifts were hard-coded in the disassembler. This
commit tries to move them to shared logic.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40199>
Rather than opcode/opcode2 hardcoded, treat the opcode as a list of
one or more subcodes.
This implies modifying the disassembler to hold an arbitrary depth dict
of dicts and recursively build the switch statements used to look up
each level.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40199>
This splits the printing logic from the iteration logic, making it
easier to reason about either.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40199>
Opcode2 was a bit all over the place, so utilize the new opcode modifier
to gather opcode2 information in a single place.
This cleans up the implicit va_mods "left", "descriptor_type" and
"memory_width".
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40199>
Rather than having the opcode as an attribute and the offset/mask being
implicit, make all of this information explicit in the xml.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40199>
These instructions were not generated as they do not exist.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40199>
This is already the default value, so there's no point in overriding it
to itself.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40489>