Commit Graph

7861 Commits

Author SHA1 Message Date
Lionel Landwerlin
08f3950d6b anv: stop using old entrypoint/struct/enum names for 1.3
v2: More replacements

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15920>
2022-04-13 21:13:56 +00:00
Lionel Landwerlin
e11bedb9f5 intel/fs: add a note on possible optimization of root node address
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15910>
2022-04-13 11:24:49 +00:00
Lionel Landwerlin
9c0805ef91 intel/fs: fix metadata preserve on trace_ray intrinsic
c78be5da30 ("intel/fs: lower ray query intrinsics") introduced a
helper function using nir_(push|pop)_if which invalidated dominance &
block_index for the replacement of nir_intrinsic_rt_trace_ray.

We can still keep dominance/block_index metadata for the lowering of
nir_intrinsic_rt_execute_callable though.

This change uses 2 different lowering function with correct metadata
preservation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15910>
2022-04-13 11:24:49 +00:00
Jason Ekstrand
69b5424ea4 intel/nir: Lower 8 and 16-bit bitwise unops
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15829>
2022-04-12 23:19:38 +00:00
Jason Ekstrand
a482877c70 intel/fs: Implement 16-bit [ui]mul_high
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15829>
2022-04-12 23:19:38 +00:00
Mykhailo Skorokhodov
9c7e750ffe intel/fs: Enable b2f(inot(a)) and b2i(inot(a)) optimization for Gfx12+
The commit enables the optimization for Intel Gfx12+ graphics.

Tigerlake
```
total instructions in shared programs: 1289326 -> 1289015 (-0.02%)
instructions in affected programs: 37841 -> 37530 (-0.82%)
helped: 78
HURT: 9
helped stats (abs) min: 1 max: 26 x̄: 4.69 x̃: 3
helped stats (rel) min: 0.10% max: 12.50% x̄: 2.07% x̃: 1.21%
HURT stats (abs)   min: 1 max: 18 x̄: 6.11 x̃: 4
HURT stats (rel)   min: 0.16% max: 1.95% x̄: 0.94% x̃: 0.61%
95% mean confidence interval for instructions value: -4.95 -2.20
95% mean confidence interval for instructions %-change: -2.34% -1.18%
Instructions are helped.

total cycles in shared programs: 105606388 -> 105606442 (<.01%)
cycles in affected programs: 620119 -> 620173 (<.01%)
helped: 49
HURT: 28
helped stats (abs) min: 2 max: 3618 x̄: 228.63 x̃: 12
helped stats (rel) min: 0.02% max: 23.31% x̄: 4.60% x̃: 1.11%
HURT stats (abs)   min: 1 max: 2142 x̄: 402.04 x̃: 29
HURT stats (rel)   min: 0.01% max: 36.42% x̄: 5.01% x̃: 0.46%
95% mean confidence interval for cycles value: -151.80 153.20
95% mean confidence interval for cycles %-change: -3.00% 0.79%
Inconclusive result (value mean confidence interval includes 0).
```

Related-to: 7725d60938
Signed-off-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14017>
2022-04-12 10:55:05 +00:00
Marcin Ślusarz
65600a34c2 anv: initialize 3DMESH_1D.ExtendedParameter0 when ExtendedParameter0Present
When IndirectParameterEnable==true it's not actually used by the hardware,
but if it's not initialized and INTEL_DEBUG=bat is set, then Valgrind complains.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15850>
2022-04-12 09:10:31 +00:00
Marcin Ślusarz
f844ce66c8 anv: fix push constant lowering for task/mesh
Fixes: a6031cd9bd ("anv: fix push constant lowering with bindless shaders")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15850>
2022-04-12 09:10:31 +00:00
Francisco Jerez
e858da39e5 intel/perf: Fix OA report accumulation on Gfx12+.
The intel_perf_query path used for performance queries on GL was
passing a bogus "end" pointer to intel_perf_query_result_accumulate(),
causing it to accumulate garbage values.  This was causing the values
of many performance counters to be corrupted.

The "end" pointer was incorrect because the current code was assuming
that different OA reports were located TOTAL_QUERY_DATA_SIZE bytes
apart, which is a hard-coded preprocessor define.  However recent
(Gfx12+) hardware generations use a variable query size determined by
the query layout.  Use the size derived from it instead, and remove
the stale define.

Fixes: 3c51325025 ("intel/perf: switch query code to use query layout")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15783>
2022-04-12 00:11:47 +00:00
Kenneth Graunke
b05ac36f01 intel/genxml: Add SAMPLER_MODE bits for enabling Small PL on Icelake
This enables a lower power mode in the sampler hardware in certain
common scenarios.  On Tigerlake, SAMPLER_MODE is not programmable by
userspace but the kernel already sets this bit for us.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
2022-04-11 19:17:07 +00:00
Kenneth Graunke
e3defe7ae7 intel/genxml: Delete SAMPLER_MODE register definition on Gfx12+
While this register still exists, it's no longer a per-context register.
Instead, on Gfx12+, SAMPLER_MODE exists per dual-subslice and is
accessed as a "multicast" register, where you write control which
version is accessed by the "steering control register".

At any rate, userspace cannot write it any longer, and so there's not
much point to it existing in our genxml (which was missing most of the
fields anyway).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
2022-04-11 19:17:07 +00:00
Kenneth Graunke
8092704705 intel/genxml: Add new "Low Quality Filter" field on Gfx12+.
This allows the sampler to perform faster filtering of 8-bit UNORM
textures by filtering them at a different precision.  The filtering
is intended to still be OpenGL and DirectX spec compliant.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
2022-04-11 19:17:07 +00:00
Kenneth Graunke
9a70385e2b intel/genxml: Add SAMPLER_STATE::Allow Low Quality LOD Calculation field
This allows the hardware to perform a faster LOD calculation in many
simple cases.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
2022-04-11 19:17:07 +00:00
Vitalii.Lomaka
1407a4db69 intel/batch-decoder: Fix uninitialized scalar variables
CID: 1498516
CID: 1498560

Signed-off-by: Vitalii Lomaka <vitalii.lomaka@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15685>
2022-04-08 18:35:34 +00:00
Benjamin Cheng
0666b7fecc anv: drop from_wsi bit from anv_image
It was originally introduced in ca791f5c but it was never actually set
anywhere. It doesn't serve any purpose other than some sanity checking
so let's clean it up for now.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15799>
2022-04-07 18:46:50 +00:00
Ian Romanick
b5fa43952a intel/fs: Better handle constant sources of FS_OPCODE_PACK_HALF_2x16_SPLIT
I noticed that a *LOT* of fragment shaders in Shadow of the Tomb Raider,
for instance, end up with a sequence of NIR like:

    vec1 32 ssa_2 = load_const (0x00000000 = 0.000000)
    ...
    vec1 32 ssa_191 = pack_half_2x16_split ssa_188, ssa_2
    vec1 32 ssa_192 = pack_half_2x16_split ssa_189, ssa_2
    vec1 32 ssa_193 = pack_half_2x16_split ssa_190, ssa_2

This results in an assembly sequence like:

    mov(8)          g28<1>UD        0x00000000UD
    mov(8)          g21<2>HF        g28<8,8,1>F
    shl(8)          g21<1>UD        g21<8,8,1>UD    0x00000010UD
    mov(8)          g21<2>HF        g25<8,8,1>F
    mov(8)          g19<2>HF        g28<8,8,1>F
    shl(8)          g19<1>UD        g19<8,8,1>UD    0x00000010UD
    mov(8)          g19<2>HF        g23<8,8,1>F
    mov(8)          g20<2>HF        g28<8,8,1>F
    shl(8)          g20<1>UD        g20<8,8,1>UD    0x00000010UD
    mov(8)          g20<2>HF        g24<8,8,1>F

After this commit, the generated assembly is:

    mov(8)          g21<1>UD        0x00000000UD
    mov(8)          g21<2>HF        g23<8,8,1>F
    mov(8)          g19<1>UD        0x00000000UD
    mov(8)          g19<2>HF        g17<8,8,1>F
    mov(8)          g20<1>UD        0x00000000UD
    mov(8)          g20<2>HF        g18<8,8,1>F

Tiger Lake, Ice Lake, Skylake, and Haswell had similar results. (Ice Lake shown)
total instructions in shared programs: 20119086 -> 20119034 (<.01%)
instructions in affected programs: 9056 -> 9004 (-0.57%)
helped: 8
HURT: 0
helped stats (abs) min: 2 max: 16 x̄: 6.50 x̃: 4
helped stats (rel) min: 0.29% max: 1.75% x̄: 1.00% x̃: 0.98%
95% mean confidence interval for instructions value: -11.01 -1.99
95% mean confidence interval for instructions %-change: -1.56% -0.44%
Instructions are helped.

total cycles in shared programs: 861019414 -> 861021044 (<.01%)
cycles in affected programs: 279862 -> 281492 (0.58%)
helped: 4
HURT: 2
helped stats (abs) min: 6 max: 936 x̄: 239.00 x̃: 7
helped stats (rel) min: 0.03% max: 8.13% x̄: 2.09% x̃: 0.09%
HURT stats (abs)   min: 18 max: 2568 x̄: 1293.00 x̃: 1293
HURT stats (rel)   min: 0.36% max: 1.14% x̄: 0.75% x̃: 0.75%
95% mean confidence interval for cycles value: -972.56 1515.89
95% mean confidence interval for cycles %-change: -4.77% 2.49%
Inconclusive result (value mean confidence interval includes 0).

Broadwell
total instructions in shared programs: 17812327 -> 17812263 (<.01%)
instructions in affected programs: 9867 -> 9803 (-0.65%)
helped: 8
HURT: 0
helped stats (abs) min: 2 max: 28 x̄: 8.00 x̃: 4
helped stats (rel) min: 0.32% max: 1.80% x̄: 1.00% x̃: 0.95%
95% mean confidence interval for instructions value: -15.46 -0.54
95% mean confidence interval for instructions %-change: -1.54% -0.47%
Instructions are helped.

total cycles in shared programs: 904768620 -> 904773291 (<.01%)
cycles in affected programs: 454799 -> 459470 (1.03%)
helped: 4
HURT: 4
helped stats (abs) min: 36 max: 586 x̄: 344.50 x̃: 378
helped stats (rel) min: 0.47% max: 4.04% x̄: 2.01% x̃: 1.77%
HURT stats (abs)   min: 1 max: 5572 x̄: 1512.25 x̃: 238
HURT stats (rel)   min: <.01% max: 2.77% x̄: 1.46% x̃: 1.53%
95% mean confidence interval for cycles value: -1122.40 2290.15
95% mean confidence interval for cycles %-change: -2.26% 1.71%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 18581 -> 18579 (-0.01%)
spills in affected programs: 323 -> 321 (-0.62%)
helped: 1
HURT: 0

total fills in shared programs: 24985 -> 24981 (-0.02%)
fills in affected programs: 1348 -> 1344 (-0.30%)
helped: 1
HURT: 0

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 143585431 -> 143513657 (-0.0%)
Instructions helped: 14403

Cycles in all programs: 8439312778 -> 8439371578 (+0.0%)
Cycles helped: 10570
Cycles hurt: 3290

Gained: 146
Lost: 74

All of the lost and gained fossil-db shaders are SIMD32 fragment
shaders.  14,247 of the affected shaders are from Shadow of the Tomb
Raider.  154 are from Batman Arkham Origins, and the remaining two are
from Octopath Traveler.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15089>
2022-04-07 18:26:23 +00:00
Ian Romanick
c08302670b intel/compiler: Fix sample_d messages on DG2
DG2 can only do sample_d and sample_d_c on 1D and 2D surfaces.  The
maximum number of gradient components and coordinate components should
be 2.  In spite of this limitation, the Bspec lists a mysterious R
component before the min_lod, so the maximum coordinate components is 3.

Fixes the following Vulkan CTS failures on DG2:

    dEQP-VK.glsl.texture_functions.texturegradclamp.isampler1d_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.isampler2d_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1d_fixed_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1d_float_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2d_fixed_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2d_float_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.usampler1d_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.usampler2d_fragment

The Fixes: tag below is a bit misleading. This commit fixes some test
cases similar to ones fixed by the Fixes: commit.  I just want to make
sure this commit gets applied everywhere that commit was also applied.

Fixes: 635ed58e52 ("intel/compiler: Lower txd for 3D samplers on XeHP.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15781>
2022-04-07 17:09:28 +00:00
Jason Ekstrand
13fc698cef anv/formats: Relax usage checks if EXTENDED_USAGE_BIT is set
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14153>
2022-04-07 15:56:33 +00:00
Lionel Landwerlin
b5031bd6f7 intel/nir: don't report progress on rayqueries if no queries
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15769>
2022-04-07 08:24:19 +00:00
Lionel Landwerlin
56ef501e3a blorp: disable depth bounds
Otherwise the driver setting interacts with it.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 939ddccb7a ("anv: Add support for depth bounds testing.")
Fixes: 1df871f8ff ("iris: Add support for depth bounds testing.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15763>
2022-04-06 19:00:50 +00:00
Lionel Landwerlin
3069337144 anv: remove unused 3DSTATE_DEPTH_BOUNDS fields
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15763>
2022-04-06 19:00:50 +00:00
Lionel Landwerlin
88f77aa811 anv: disable preemption on 3DPRIMITIVE on gfx12
To workaround a push constant corruption issue.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5963
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5662
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15753>
2022-04-06 12:51:15 +00:00
Vadym Shovkoplias
04a6693871 anv: fix EXT_depth_clip_control
This fixes arb_clip_control-clip-control and depth_clamp piglit
tests on zink.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6186
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15561>
2022-04-06 13:26:52 +03:00
Jason Ekstrand
29b8097408 anv: Enable VK_EXT_debug_utils
It's implemented in common code as long as you use vk_command_buffer.

Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15560>
2022-04-06 01:18:23 +00:00
Mike Blumenkrantz
6fd344ff98 anv: expose VK_EXT_image_2d_view_of_3d
sampling only available on gen9+

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15754>
2022-04-05 20:30:31 +00:00
Omar Akkila
4208895175 ci: bump VK-GL-CTS to 1.3.1.1
Signed-off-by: Omar Akkila <omar.akkila@collabora.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15668>
2022-04-04 23:04:33 +00:00
Jason Ekstrand
94ce812497 anv: Advertise two more formats
These both require swizzling so border colors won't work.  However,
they're conveniently in the list of formats for which custom border
colors require you to specify a format in the sampler.  That list
constists of:

    - VK_FORMAT_B4G4R4A4_UNORM_PACK16
    - VK_FORMAT_B5G6R5_UNORM_PACK16
    - VK_FORMAT_B5G5R5A1_UNORM_PACK16

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6226
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15624>
2022-04-04 21:42:23 +00:00
Jason Ekstrand
e32b9e5c3f anv: Generalize border color swizzles
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15624>
2022-04-04 21:42:23 +00:00
Jason Ekstrand
54509d27d9 anv: Disallow blending on swizzled formats
Fixes: c20f78dc5d ("anv: Support swizzled formats.")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15624>
2022-04-04 21:42:23 +00:00
Jason Ekstrand
257a20f40d intel/isl: Add a helper for swizzling color values
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15624>
2022-04-04 21:42:23 +00:00
Ian Romanick
7fd1955412 nir: intel/compiler: Lower TXD on array surfaces on DG2+
DG2 can only do sample_d and sample_d_c on 1D and 2D surfaces.  Cube
maps and 3D surfaces were already handled, but 1D array and 2D array
surfaces were not.

Fixes the following Vulkan CTS failures on DG2:

    dEQP-VK.glsl.texture_functions.texturegradclamp.isampler1darray_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.isampler2darray_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1darray_fixed_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1darray_float_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2darray_fixed_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2darray_float_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.usampler1darray_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.usampler2darray_fragment

The Fixes: tag below is a bit misleading. This commit adds another
lowering, similar to the one in the Fixes: commit, that probably should
have been added at the same time.  I just want to make sure this commit
gets applied everywhere that commit was also applied.

Fixes: 635ed58e52 ("intel/compiler: Lower txd for 3D samplers on XeHP.")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15681>
2022-03-31 12:59:18 -07:00
Rohan Garg
d876abeaa8 anv: Drop dead code in anv_UpdateDescriptorSets
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15666>
2022-03-30 15:19:47 +02:00
Lionel Landwerlin
684a4ea30c intel/clc: fix missing pointer write
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 346a7f14fb ("intel/compiler: Add code for compiling CL-style SPIR-V kernels")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15611>
2022-03-30 07:56:25 +00:00
Mike Blumenkrantz
65ec846f77 intel/isl: fix 2d view of 3d textures
according to KHR_gl_texture_3D_image:

    If <target> is EGL_GL_TEXTURE_3D_KHR, <buffer> must be the name of a
    complete, nonzero, GL_TEXTURE_3D (or equivalent in GL extensions) target
    texture object, cast
    into the type EGLClientBuffer.  <attr_list> should specify the mipmap
    level (EGL_GL_TEXTURE_LEVEL_KHR) and z-offset (EGL_GL_TEXTURE_ZOFFSET_KHR)
    which will be used as the EGLImage source; the specified mipmap level must
    be part of <buffer>, and the specified z-offset must be smaller than the
    depth of the specified mipmap level.

thus a 2d view of a 3d surface is not only legal, it's part of the spec and
must be supported when available

cc: mesa-stable

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15584>
2022-03-29 21:44:51 +00:00
Kenneth Graunke
8831cb38aa anv: Stop updating STATE_BASE_ADDRESS on XeHP
Now that we're using 3DSTATE_BINDING_TABLE_POOL_ALLOC to set the base
address for the binding table pool separately from surface states, we
don't actually need to update surface state base address anymore.

Instead, we can just set STATE_BASE_ADDRESS once at context creation,
and never bother updating it again, saving some heavyweight flushes
and freeing us from the need for address offsetting trickery.

This patch was originally written by Jason Ekstrand, with fixes from
Lionel Landwerlin, but was targeting Icelake.  Doing it there requires
additional changes (15:5 -> 18:8 binding table pointer formats) which
also involve some trade-offs, whereas the XeHP change is purely a win,
so we'll do it here first.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15616>
2022-03-29 20:45:59 +00:00
Kenneth Graunke
1967fd3b10 intel/compiler: Call inst->resize_sources before setting the sources
You should probably resize the sources array before accessing entries
that might be out of bounds.  inst->resize_sources() always allocates
enough space for at least 3 sources, so this is really only an issue
when there are 4+ sources.

Fixes: a920979d4f ("intel/fs: Use split sends for surface writes on gen9+")
Fixes: 4f86a70599 ("intel/fs: Lower DW untyped r/w messages to LSC when available")
Fixes: d372abe397 ("intel/fs: Add surface OWORD BLOCK opcodes")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15632>
2022-03-29 13:06:17 -07:00
Kenneth Graunke
9bc97e4fc1 intel/decoder: Fix decoder handling of binding table pool alloc on XeHP
3DSTATE_BINDING_TABLE_POOL_ALLOC no longer has a "Binding Table Pool
Enable" bit.  It is always enabled.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15625>
2022-03-29 02:35:54 -07:00
Georg Lehmann
922916bf64 nir: Move lower_usub_sat64 to nir_lower_int64_options.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15421>
2022-03-28 20:02:52 +00:00
Kenneth Graunke
823745dc27 intel/compiler: Use nir_opt_uniform_atomics()
In general, an atomic intrinsic may perform separate atomics for every
enabled SIMD channel, as each channel may operate on different memory.

However, an extremely common case is for all channels to access the same
memory location.  In this case, we can simply perform a reduction/scan
across the subgroup, and perform one atomic for the whole subgroup,
rather than one per channel.  For example, if an intrinsic says to take
the minimum value of the existing memory and the value in each channel,
we can do a thread-local minimum of all enabled channels, then do a
single atomic to take the minimum of that and the existing memory.

Our hardware doesn't optimize the case where multiple channels ask for
atomics on the same memory location; it assumes the compiler will do so.

nir_opt_uniform_atomics() uses divergence analysis to detect this case,
adds the necessary subgroup operations, and moves the atomic inside a
conditional that disables all but a single invocation.  It even detects
cases where the shader code already performs this kind of optimization,
and avoids doing it a second time.

This may not be the optimal solution for us.  In the backend, we could
detect this case and emit send(1) instructions with NoMask, rather than
generating if...send(16)...endif, and a lot of unnecessary ALU ops.  But
it's simple to do, reuses the same path as ACO, and still provides most
of the benefit by cutting up to 16x atomics down to a single atomic,
which is more merciful to the memory bus.

Improves performance of Shadow of the Tomb Raider by 5.5% on XeHP.

Improves performance of a customer-internal benchmark on XeHP at
3840x2160 and low settings by approximately 30%.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
2022-03-26 00:28:19 +00:00
Kenneth Graunke
49ef23f4a6 intel/compiler: Convert to LCSSA and use divergence analysis.
We'll use this more shortly.  For now, enable it to separately in case
anything bisects to this.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
2022-03-26 00:28:19 +00:00
Kenneth Graunke
b3942beecf intel/compiler: Set divergence analysis options
Although we don't use divergence analysis yet, we've had several
work-in-progress series that make use of it.  We may as well set
our options so that those series can assume they're in place.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
2022-03-26 00:28:19 +00:00
Kenneth Graunke
6fa66ac228 intel/compiler: Implement nir_intrinsic_last_invocation
We haven't exposed this intrinsic as it doesn't directly correspond to
anything in SPIR-V.  However, it's used internally by some NIR passes,
namely nir_opt_uniform_atomics().

We reuse most of the infrastructure in brw_find_live_channel, but with
LZD/ADD instead of FBL.  A new SHADER_OPCODE_FIND_LAST_LIVE_CHANNEL is
like SHADER_OPCODE_FIND_LIVE_CHANNEL but from the other side.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
2022-03-26 00:28:19 +00:00
Caio Oliveira
c32d386ce2 intel/compiler: Inline TUE map computation into TUE Input lowering
Refactor since the TUE compute function is simpler now and the
comments make sense being near the lowering.

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15022>
2022-03-25 23:29:19 +00:00
Caio Oliveira
c36ae42e4c intel/compiler: Use nir_var_mem_task_payload
Instead of reusing the in/out slot mechanism, use a separated NIR
variable mode.  This will make easier later to implement staging the
output in shared memory (and storing all at the end to the URB).

Note to get 64-bit type support we currently rely on the
brw_nir_lower_mem_access_bit_sizes() pass.

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15022>
2022-03-25 23:29:19 +00:00
Boris Brezillon
49c8b93288 anv: Stop using VK_OUTARRAY_MAKE()
We're trying to replace VK_OUTARRAY_MAKE() by VK_OUTARRAY_MAKE_TYPED()
so people don't get tempted to use it and make things incompatible with
MSVC (which doesn't support typeof()).

Suggested-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15522>
2022-03-25 11:00:03 +00:00
Caio Oliveira
f82731d0d7 intel/fs: Fix IsHelperInvocation for the case no discard/demote are used
Use emit_predicate_on_sample_mask() helper that does check where to
get the correct mask depending on whether discard/demote was used or
not.

Fixes: 45f5db5a84 ("intel/fs: Implement "demote to helper invocation"")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15400>
2022-03-25 08:20:27 +00:00
Caio Oliveira
bb311c22df intel/fs: Initialize the sample mask in flags register when using demote
Without this change, a check for "is helper invocation" could read
uninitialized values.

Fixes: 45f5db5a84 ("intel/fs: Implement "demote to helper invocation"")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15400>
2022-03-25 08:20:27 +00:00
Lionel Landwerlin
8cdd5647c6 anv: don't store sample location sample count
This information should match the current pipeline sample count.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15310>
2022-03-24 10:49:07 +00:00
Lionel Landwerlin
6f5f817c0f anv: fix dynamic sample locations on Gen7/7.5
3DSTATE_MULTISAMPLE should be baked into the pipeline if not dynamic.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 27ee40f4c9 ("anv: Add support for sample locations")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15310>
2022-03-24 10:49:07 +00:00
Lionel Landwerlin
8ad78671b3 anv: use local dynamic pointer more
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15310>
2022-03-24 10:49:07 +00:00