Same as the L1 case, but this one deals with 64bit entry addresses and
pte addresses.
Consecutive L3/L2 writes are much rarer than L1 writes since they
require some pretty big buffers, but we can still those cases in the
wild. I just don't think any change will be noticeable though.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
TR-TT is a hardware feature supported by both i915.ko and xe.ko, which
means we can now finally have Sparse Resources on i915.ko and we also
have 2 options for xe.ko (and whatever is the best should be the
default).
In this patch we use batch commands to write the page tables and
forever keep them in device memory. We maintain a mirror of both the
L3 and and L2 tables because that helps us never having to read the
tables that are in device memory.
We still have some things to improve, but with this commit, workloads
that didn't work at all due to the lack of sparse resources should
at least run.
This is still all disabled by default in i915.ko, you can turn it on
by exporting ANV_SPARSE=1 before launching the applications. For
xe.ko, switch the default with ANV_SPARSE_USE_TRTT=1.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
We're currently updating the hitT value in the traversal result with
the hitT value from reportIntersection(), but this is not correct.
First the hitT value of reportIntersection() should update the
gl_RayTmaxEXT value (maps to brw_nir_rt_mem_ray_defs::t_far).
Second the hitT determined by traversal should only be updated if the
reportIntersection() hitT value has updated the gl_RayTmaxEXT and that
the new gl_RayTmaxEXT is smaller than the determined hitT value from
traversal.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Fixes: 303378e1dd ("intel/rt: Add lowering for combined intersection/any-hit shaders")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25146>
Patch fxes ESO shadow pass ground corruption on Arc A750. In the colour
pass where the rendering corruption first appears, the depth resource
was used as a "PS - Texture". Immediately afterwards there's a Barrier
where it goes from
VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL =>
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
immediately following that there's a Clear from vkCmdBeginRendering
which appears to be a HiZ clear. Things work when using AUX_USAGE_HIZ
but AUX_USAGE_HIZ_CCS_WT (XXX: and AUX_USAGE_HIZ_CCS?) doesn't work.
current thinking is this is related to 14015264727 where we had to add
HDC and DC flushes to CCS and MCS fast clears. Maybe HiZ clears with
CCS also have similar problems? The docs don't appear to indicate that
but the docs were also wrong for color clears until recently...
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9277
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9444
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22717>
Some fields in schedule_node will need to be reset each time they are
used. The `cand_generation` needs to be back to zero, and both
`unblocked_time` and `parent_count` need to be back to their initial
values, which were pre-calculated.
Rename the initial data fields and add new ones for the temporary data.
Note the helper function is `per node` to allow it "tag along" with an
existing loops.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
Pull run() and schedule_instructions() for fs, and pull a very
simplified version of those into a run() for vec4. Because of the
previous patches the duplication is small.
Since we are touching these, change run() implementations to use the
cfg from the existing reference to the visitor/shader instead of taking
one as argument.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
Commit 50c29e1ffa ("anv: simplify buffer address+size loads from descriptor buffer")
is making use of AuxiliarySurfaceBaseAddress field to store buffer
lenght as it was not used but a LNL workaround will make use of it
so we need to bring back this non optimized version of
build_load_render_surface_state_address().
There is some conflicts so a simple revert do not works.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26152>
The current name doesn't cover all the tex related instructions and
in all usages, we already have a switch statement to dispatch
per instruction type, so is more natural to list the instructions we
care there.
In fs::is_send_from_grf() we can simply ignore them since the
instructions are either lowered directly to SEND (Gfx7+) or use
MRF (Gfx6-).
With this change, the fs_inst::size_read() generated code gets
simplified (the "tex" entries get added to the switch jump table
in gcc) and the default case loses the conditional handling tex.
This reduces shader compilation time, as illustrated by replaying
fossils (tested on my TGL laptop):
```
// Rise of the Tomb Raider (N=13)
Difference at 95.0% confidence
-1.32231 +/- 0.0170138
-4.37605% +/- 0.0563054%
(Student's t, pooled s = 0.0210159)
// Cyberpunk 2077 (N=7)
Difference at 95.0% confidence
-3.64 +/- 0.114993
-2.95188% +/- 0.0932544%
(Student's t, pooled s = 0.09873)
```
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25721>
With the removal of DRM_IOCTL_XE_MMIO xe_gem_read_render_timestamp()
was always returning false but with DRM_XE_DEVICE_QUERY_ENGINE_CYCLES
it can be re implemented making use of
xe_gem_read_correlate_cpu_gpu_timestamp().
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24591>