Intel has different Z interpolation float point rounding
than other mesa gpus
For example gl_Position.z = 0.0 will be interpolated to
gl_FragCoord.z = 0.5 for all gpus
gl_FragCoord = -0.00000001 will be interpolated to
gl_FragCoord.z = 0.4999999702 for Intel
and rounded to gl_FragCoord.z = 0.5 for other gpus
Games with LEQUAL depth func will fail depth test on Intel
and will pass it on other gpus in such case
This workaround lowers translated depth range
and several gl_FragCoord.z coords with extra small difference
will be translated to the same UINT16\UINT24\UINT32
value of an integer depth buffer
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7199
Signed-off-by: Illia Polishchuk <illia.a.polishchuk@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18412>
During the lower_regioning() optimization, required_exec_type() is
returning BRW_REGISTER_TYPE_UQ type when processing
SHADER_OPCODE_SHUFFLE instructions of type BRW_REGISTER_TYPE_DF but
MTL has float64 support but lacks int64 support causing shader
compilation to fail.
To fix that we could make required_exec_type() return
BRW_REGISTER_TYPE_DF in such case but SHADER_OPCODE_SHUFFLE virtual
instruction runs in the integer pipeline(inferred_exec_pipe()).
So here replacing the has_64bit check by has_64bit_int, this will
properly handle older and newer cases making this function return
BRW_REGISTER_TYPE_UD.
Then lower_exec_type() will take care to generate 2 32bits operations
to accomplish the same.
While at it also dropping the 'devinfo->verx10 == 70' check as
GFX7_FEATURES fall into the same category as MTL, has float64 but no
int64 support.
Fixes at least this crucible tests:
func.uniform-subgroup.exclusive.fadd64.q0
func.uniform-subgroup.exclusive.fmin64.q0
func.uniform-subgroup.exclusive.fmax64.q0
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18577>
Move subgroup_id, that's only used by CS for verx10 < 125, as part of
the payload too -- even though is not, strictly speaking.
Note the thread execution of Task/Mesh is similar enough, so we make
their common struct inherit from cs_thread_payload.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
When loading a TCS or GS input, we generate some code to read the URB
handle for a particular input control point (ICP handle), which often
involves indirect addressing due to a non-constant vertex.
For example:
mov(8) vgrf148+0.0:UW, 76543210V
shl(8) vgrf149:UD, vgrf148+0.0:UW, 2u
shl(8) vgrf150:UD, vgrf145:UD, 5u
add(8) vgrf151:UD, vgrf150:UD, vgrf149:UD
mov_indirect(8) vgrf147:UD, g2:UD, vgrf151:UD, 96u
Unfortunately, the first load with 76543210V is considered a partial
write because the 8 channels of 16-bit UW data doesn't fill an entire
register, and we can't allocate VGRFs at sub-register granularity.
This causes none of the above math to be CSE'd, even though the first
two instructions are common to *all* input loads, and the rest may be
reused sometimes as well.
To work around this, we stop emitting 76543210V to a temporary, and
instead use nir_system_values[SYSTEM_VALUE_SUBGROUP_INVOCATION], which
already contains this value, and is unconditionally set up for us.
With all input loads using the same register for the sequence, our
CSE pass is able to eliminate the rest of the common math.
shader-db results on Tigerlake:
total instructions in shared programs: 20748243 -> 20744844 (-0.02%)
instructions in affected programs: 73410 -> 70011 (-4.63%)
helped: 242 / HURT: 21
helped stats (abs) min: 1 max: 37 x̄: 14.17 x̃: 15
helped stats (rel) min: 0.17% max: 19.58% x̄: 6.13% x̃: 6.32%
HURT stats (abs) min: 1 max: 4 x̄: 1.38 x̃: 1
HURT stats (rel) min: 0.18% max: 1.31% x̄: 0.58% x̃: 0.58%
95% mean confidence interval for instructions value: -13.73 -12.12
95% mean confidence interval for instructions %-change: -6.00% -5.19%
Instructions are helped.
total cycles in shared programs: 785828951 -> 785788480 (<.01%)
cycles in affected programs: 597593 -> 557122 (-6.77%)
helped: 227 / HURT: 13
helped stats (abs) min: 6 max: 624 x̄: 182.19 x̃: 185
helped stats (rel) min: 0.24% max: 18.22% x̄: 7.85% x̃: 7.80%
HURT stats (abs) min: 2 max: 153 x̄: 68.08 x̃: 36
HURT stats (rel) min: 0.03% max: 7.79% x̄: 2.97% x̃: 1.25%
95% mean confidence interval for cycles value: -182.55 -154.71
95% mean confidence interval for cycles %-change: -7.84% -6.69%
Cycles are helped.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18455>
We found a perf regression with 9027c5df4c ("anv: remove the
LOCAL_MEM allocation bit") which seems to be that we over subscribe
local memory, leading i915 to swap things in/out too much.
This change avoid putting buffers in local memory if they are not
allocated from a DEVICE_LOCAL heap.
Maybe we can revisit this later if i915 is better able to deal with
more buffers in local memory.
v2: Remove implicit_css from anv_bo when not in lmem (Ivan)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9027c5df4c ("anv: remove the LOCAL_MEM allocation bit")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7188
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18395>