Now that all larger workgroup sizes are lowered to 256,
the regalloc hang cannot mess up the compute queues anymore.
Still don't allow compute queues on GFX6 though,
those have never been enabled ever since RadeonSI started using
the compute queue in a1378639ab - let's keep it that way.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39288>
Now that all larger workgroup sizes are lowered to 256,
the regalloc hang cannot mess up the compute queues anymore.
Still don't allow compute queues on GFX6 though,
they are prone to hangs. Needs further investigation.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39288>
Even though radeonsi may not use compute queues, other processes
might run compute jobs in the background, so radeonsi must make
sure not to use larger than 256 sized workgroups on GPUs that
are affected by the regalloc hang.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39288>
Even though radeonsi may not use compute queues, other processes
might run compute jobs in the background, so radeonsi must make
sure not to use larger than 256 sized workgroups on GPUs that
are affected by the regalloc hang.
Unfortunately that means that for now RadeonSI won't be able to
support ARB_compute_variable_group_size on these GPUs.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39288>
So that we can put the coarse_pixel_dispatch value available to NIR
lowering.
LNL internal fossildb changes:
Totals from 40 (0.01% of 490838) affected shaders:
Instrs: 33321 -> 33311 (-0.03%); split: -0.04%, +0.01%
Cycle count: 780136 -> 779936 (-0.03%); split: -0.03%, +0.00%
Max live registers: 5292 -> 5298 (+0.11%)
Non SSA regs after NIR: 26638 -> 26464 (-0.65%)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38996>
We don't need to take ETNA_DIRTY_SHADER into consideration for pure
updates of the constant states. When the shader is dirty constants
and code will be uploaded together and the update path will be skipped.
The uniform cache in the context has been removed in ee1ed59458
("etnaviv: prep for UBOs"), so the comment referencing this cache
is confusing and can go as well.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39422>
Constant buffers may be changed without the shader changing.
Check the correct dirty bits when marking constant buffers
as read during the draw to ensure proper synchronization.
Fixes: a40a6e551e ("etnaviv: draw: only mark resources as read/written when the state changed")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39422>
Jobs have been timing out, requiring 2-6 additional minutes beyond the
previous 30-minute limit. Increase the overall timeout from 30 to 45
minutes, with an additional 5-minute buffer for GitLab (50m total) to
provide margin for variance.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39432>
Currently panfrost has some bug at least on Midgard T-860, which causes
an assertion failure with these 16 bpc displayable framebuffer configs:
".../drivers/panfrost/pan_fb_preload.c:341: pan_preload_get_blend_shaders:
Assertion `b->work_reg_count <= 4' failed."
See discussion in:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38588
Disable these rgb16 configs on panfrost by default for now,
as suggested by Eric Smith.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Suggested-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38588>
Needed, so dEQP can deal with 16 bpc unorm surface formats, and doesn't
generate wrong reference images with its rasterizer, leading to false
positive failures in image comparison driver vs. reference rasterizer.
As proposed by Valentine Burley, thanks!
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Suggested-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38588>
Use the EGL_EXT_config_select_group extension to put these 16 bpc
unorm formats into a lower priority config select group 1, so they
don't get preferably chosen by default by eglChooseConfig(), but must
be explicitely requested by client applications which really need the
high color precision of these 64 bpp formats and are happy to pay the
potential performance impact.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38588>
Use the EGL_EXT_config_select_group extension to put these 16 bpc
unorm formats into a lower priority config select group 1, so they
don't get preferably chosen by default by eglChooseConfig(), but must
be explicitely requested by client applications which really need the
high color precision of these 64 bpp formats and are happy to pay the
potential performance impact.
Tested to work with the GBM backend directly on a VT by running kmscube
as drm master on a AMD Polaris gpu. drm_info reports proper formats for
the DRM framebuffers.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38588>
This allows clients to send high color precision wl_buffers
to servers which support the format.
Use the EGL_EXT_config_select_group extension to put these 16 bpc
unorm formats into a lower priority config select group 1, so they
don't get preferably chosen by default by eglChooseConfig(), but must
be explicitely requested by client applications which really need the
high color precision of these 64 bpp formats and are happy to pay the
potential performance impact.
Successfully tested with KDE Kwin 6.4, and the GNOME Mutter 50
development branch, which has been enhanced to support these
formats, also for direct scanout, and with Weston 13, which is
also able to directly scan out to a 16 bpc framebuffer on suitable
AMD gpu's.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38588>
These are useful for displaying very high color precision images with
more than 10 bpc color depth, and also more precision than what fp16
can do on a standard dynamic range (SDR) display, where fp16 for values
in the unorm 0.0 - 1.0 range is about equivalent to at most ~11 bpc
linear color depth. This is especially useful for and aimed at scientific
applications, e.g., neuroscience and other bio-medical research cases.
At least current generation AMD gpu's released during the last 10 years
and supported by amdgpu-kms + atomic modesetting do allow for scanout of
such 16 bpc framebuffers and of up to 12 bpc output to suitable HDMI or
DisplayPort high precision displays.
We gate the format behind a new driconf option 'allow_rgb16_configs',
which defaults to true, but allows to disable the formats if any issues
should arise.
Most regular applications won't need the high display precision of
these new 16 bpc 64 bpp formats which have higher memory and bandwidth
requirements, and therefore a potential undesired performance impact
for regular apps. Followup per-platform enablement commits will use
the EGL_EXT_config_select_group extension to put these 16 bpc unorm
formats into a lower priority config select group 1, so they don't get
preferably chosen by default by eglChooseConfig(), but must be explicitely
requested by client applications which really need the high color
precision of these 64 bpp formats and are happy to pay the potential
performance impact. Thanks to Adam Jackson for pointing me to the
EGL_EXT_config_select_group extension.
If the format would be put into the default config select group 0, a
simple EGL eglChooseConfig() call would end up choosing these formats,
which is not what such regular apps would want.
Tested to not cause any change on native X11/EGL and X11/GLX, which only
supports at most 30 bpc / 32 bpp formats.
Followup commits will enable these formats for the EGL/Wayland backend,
and on the EGL/DRM backend.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38588>
Detects 16 bpc unorm formats. Used by following RGB[A]16
UNORM display enablement commits.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38588>
The index buffer unrolling logic was based on asahi's implementation in
libagx/geometry.cl.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38547>
Mali hardware handles a bunch of OOB conditions by using addresses with
the top bit set. When the top bit is set, any load/store from a shader
will treat the address as OOB and read zero and discard writes. We can
use this to implement ro_sink_address_poly.
Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38547>
To count primitives generated when primitive restart is enabled, we need
to dispatch a precomp shader with dimensions that depend on the indirect
draw count. For this, we need some kind of precomp dispatch mechanism
where the dimensions are determined at execution time. The standard
shape for this would be an "indirect precomp" dispatch which reads
dimensions from memory, but in this case that would be a little awkward
because the draw count is min(*draw_count_buffer, max_draw_count). So to
implement that with a typical indirect dispatch mechanism, we would need
to read the buffer, compare it with the max draw count, and then write
it back to scratch memory, just to read it again. To simplify this, I
went for a precomp dispatch mechanism where it just reading the
dimensions from the JOB_SIZE registers set by the caller, which can use
whatever CS code it wants to calculate them.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38547>
Primitive restart requires scanning the index buffer to determine how
many primitives are present, and will be handled in a later commit.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38547>
static_assert required a message argument until C23. Adding it fixes
the debian-clang CI jobs.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38547>
Use v3dv_job_apply_barrier_state to consume pending barriers when
executing secondary command buffers. This ensures we only serialize
against relevant stages, addressing FIXME.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39278>
PAL always set WD_SWITCH_ON_EOP for pre gfx10 when primitive
restart is enabled to prevent gpu hang.
It only happens when specific index stream with primitive
restart. Since we don't know what's the exact problem,
just follow PAL to disable 4x primitive rate when primitive
restart is enabled.
GFX10+ does not use this function.
Cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39292>
PAL always set WD_SWITCH_ON_EOP for pre gfx10 when primitve
restart is enabled to prevent gpu hang.
It only happens when specific index stream with primitive
restart. Since we don't know what's the exact problem,
just follow PAL to disable 4x primitive rate when primitive
restart is enabled.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14629
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39292>