Setting the NIR options takes care of iris thanks to the common st/mesa
linking code, and updating brw_nir_link_shaders should handle anv.
The main effort here is updating remap_tess_levels, which needs to
handle vector stores, writemasking, and swizzling. Unfortunately,
we also need to continue handling the existing single-component
access because it's used for TES inputs, which we don't vectorize.
We could try to vectorize TES inputs too, but they're all pushed
anyway, so it wouldn't buy us much other than deleting this code.
Also, we do have opt_combine_stores, but not one for loads.
One limitation of using nir_vectorize_tess_levels is that it works
on variables, and so isn't able to combine outer/inner writes that
happen to live in the same vec4 slot (for triangle domains). That
said, it's still better than before.
For writes, we allow the intrinsics to supply up to the full size
of the variable (vec4 for outer, vec2 for inner) even if the domain
only requires a subset of those components (i.e. triangles needs 3).
shader-db results on Icelake:
total instructions in shared programs: 19605070 -> 19602284 (-0.01%)
instructions in affected programs: 65338 -> 62552 (-4.26%)
helped: 271 / HURT: 0
helped stats (abs) min: 6 max: 24 x̄: 10.28 x̃: 12
helped stats (rel) min: 1.30% max: 18.18% x̄: 5.80% x̃: 7.59%
95% mean confidence interval for instructions value: -10.71 -9.85
95% mean confidence interval for instructions %-change: -6.17% -5.43%
Instructions are helped.
total cycles in shared programs: 851854659 -> 851820320 (<.01%)
cycles in affected programs: 618749 -> 584410 (-5.55%)
helped: 271 / HURT: 0
helped stats (abs) min: 69 max: 540 x̄: 126.71 x̃: 108
helped stats (rel) min: 2.57% max: 37.97% x̄: 6.17% x̃: 5.06%
95% mean confidence interval for cycles value: -135.89 -117.54
95% mean confidence interval for cycles %-change: -6.72% -5.63%
Cycles are helped.
total sends in shared programs: 1025285 -> 1024355 (-0.09%)
sends in affected programs: 6454 -> 5524 (-14.41%)
helped: 271 / HURT: 0
helped stats (abs) min: 2 max: 8 x̄: 3.43 x̃: 4
helped stats (rel) min: 5.71% max: 25.00% x̄: 14.98% x̃: 17.39%
95% mean confidence interval for sends value: -3.57 -3.29
95% mean confidence interval for sends %-change: -15.42% -14.54%
Sends are helped.
According to Felix DeGrood, this results in a 10% improvement in
the draw call time for certain draw calls from Strange Brigade.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17944>
VS, TCS, TES, and GS threads must end with a URB write message with the
EOT (end of thread) bit set. For VS and TES, we shadow output variables
with temporaries and perform all stores at the end of the shader, giving
us an existing message to do the EOT.
In tessellation control shaders, we don't defer output stores until the
end of the thread like we do for vertex or evaluation shaders. We just
process store_output and store_per_vertex_output intrinsics where they
occur, which may be in control flow. So we can't guarantee that there's
a URB write being at the end of the shader.
Traditionally, we've just emitted a separate URB write to finish TCS
threads, doing a writemasked write to an single patch header DWord.
On Broadwell, we need to set a "TR DS Cache Disable" bit, so this is
a convenient spot to do so. But on other platforms, there's no such
field, and this write is purely wasteful.
Insetad of emitting a separate write, we can just look for an existing
URB write at the end of the program and tag that with EOT, if possible.
We already had code to do this for geometry shaders, so just lift it
into a helper function and reuse it.
No changes in shader-db.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17944>
It's very unfortunate that we have the RT scratch being conflated with
the usual scratch. In our implementation those are 2 different buffers.
The usual scratch access are done through the scratch surface state
(delivered through thread payload), while RT scratch (which outlives
thread dispatch with shader calls) is its own buffer.
So checking the NIR scratch size makes no sense as we can have normal
scratch accesses completely unrelated to RT scratch accesses.
This change switches the validation by looking at whether the scratch
base pointer intrinsic is being used (which is what we use/abuse to
implement RT scratch).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
MTL has different CS prefetch sizes for each CS type.
So here replacing the cs_prefetch_size in intel_device_info struct
by a function that takes as argument the i915 engine class.
Fixes:
- func.cmd-buffer.small-secondaries.q0
- dEQP-VK.multiview.secondary_cmd_buffer.*
- Several other VK CTS tests that uses secondary_cmd_buffer
v2:
- renamed to intel_device_info_get_engine_prefetch() (Jordan)
v3:
- renamed to intel_device_info_calc_engine_prefetch()
- store each engine class prefetch in intel_device_info
BSpec: 45718
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18597>
Applications may use out-of-range values, driver is responsible for
clamping to implementation-dependent sample location coordinate
range.
Without clamp we hit assert when packing 3DSTATE_SAMPLE_PATTERN if
application attempts to use bigger value than 0.9375.
util_bitpack_ufixed: Assertion `min <= v && v <= max' failed.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18696>
If the pipeline does not use libraries and the shaders are all found in
the cache, we end up with empty groups and crash at pipeline emit time.
Fixes a bunch of tests under
dEQP-VK.pipeline.monolithic.shader_module_identifier.\*.ray_tracing\*
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18582>
It's already called in brw_postprocess_nir and calling it the second time
actually breaks shading rate.
Initially, when I added this call here in 9acb30c8c4, I was testing it
on an internal tree, which didn't have brw_nir_lower_shading_rate_output call
in brw_postprocess_nir.
Fixes: 9acb30c8c4 ("intel/compiler: implement primitive shading rate for mesh")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18702>