For temporary lookups, just allocate out of the NULL ralloc context,
so we don't have to edit the linked list of ralloc children to add it
and then immediately remove it again.
When uploading a new shader, allocate the keybox off the shader, so
if we delete the shader the keybox also goes away. Less manual cleanup.
All of the tables are static const, so they only need to be validated
once. As noted in the previous commit, the compiler should be able to
eliminate all of this code when the assertions would pass. Even with
the help of the previous commit, this does not always occur.
-Og: -95.688 +/- 3.91935 (-24.9562% +/- 1.0222%) N=5
-O1: No difference proven at 95.0% confidence. N=5
-O2: -1.962 +/- 0.85001 (-0.860013% +/- 0.372589%) N=5
Reviewed-by: Eric Anholt <eric@anholt.net>
I was pretty liberal with these assertions when I wrote this code
because I had assumed that GCC would unroll the loops, inline the look ups
of static const arrays with now constant indices, and then elmininate
all the actuall assertions. It seems none of this happens even at -O3.
Adding the pragmas helps encourage loop unrolling at some optimization
levels. I tested by running shader-db with NIR_VALIDATE=false on a Core
i7 Haswell desktop system.
-Og: No difference proven at 95.0% confidence. N=5
-O1: -48.304 +/- 1.221 (-16.3343% +/- 0.412888%) N=5
-O2: -49.94 +/- 1.23521 (-17.9634% +/- 0.444303%) N=5
v2: Add a _Pragma to an inner loop that was accidentally dropped during
a rebase.
Reviewed-by: Eric Anholt <eric@anholt.net>
Varyings are similar to already handled cases. And "glsl_zero_init"
name of the workaround already looks like it should include varyings.
The issue was observed in GiMark subtest from GpuTest.
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Just a bit of cleanup. lower_tex can do this lowering for us, which
should also eliminate some special cases (one less thing to fix if we
ever need texturing in tess/geom/etc, perhaps?)
Closes#2133
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
We fetch the info with the new intrinsic and lower with ALU ops for txl
instructions, which seemingly correspond to "TEXGRD" instructions (what
we call textureLod).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
We can stuff this information in as parametrized system values, like we
currently do texture size and SSBO addresses.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
This loads in the <min_lod, max_lod, lod_bias> settings for a given
sampler, which is necessary for lowering clamps/biases on certain
Midgard chips.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Generating a source code with a fixed size leads to issues with plattform dependent types.
We either hard code 4 or 8 bytes there, and both are wrong on the other plattform.
So this patch solves this issue by generating eg sizeof(GLsizeiptr), which is valid both
on 32 and on 64 bit plattforms.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This enables UBWC for everything except 3D textures.
It breaks many image_to_image copies but those aren't important and it can
be worked around later (image_to_image copy needs to be done in two steps,
decode from the source format and then encode to the destination format).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fix build error after llvm-10.0 commit 1dfede3122ee ("Move
CodeGenFileType enum to Support/CodeGen.h").
../src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp: In member function ‘void JitManager::DumpAsm(llvm::Function*, const char*)’:
../src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp:428:45: error: ‘CGFT_AssemblyFile’ is not a member of ‘llvm::TargetMachine’
*pMPasses, filestream, nullptr, TargetMachine::CGFT_AssemblyFile);
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
Do this by repeating processing of loops until no progress is made.
Totals from affected shaders:
SGPRS: 162576 -> 162576 (0.00 %)
VGPRS: 145228 -> 145228 (0.00 %)
Spilled SGPRs: 668 -> 668 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 15778640 -> 15771336 (-0.05 %) bytes
LDS: 146 -> 146 (0.00 %) blocks
Max Waves: 6087 -> 6087 (0.00 %)
v2: use block_kind_loop_header/block_kind_loop_exit to repeat at the end
of loops instead of at each continue
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
When GPU is idle and suspends, the currently selected countables
will all reset to the first one. So periodically restore the selected
countables.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Port from the envytools tree, but converted to use the .c tables for
describing the perfcounter groups/countables, rather than using rnndec
to get this at runtime from the register xml.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Currently this are getting blocked by the kernel.. these counters don't
seem to be the most useful ones, and to use them we'd have to somehow
probe the kernel by submitting cmdstream to write the selector regs and
see if that triggers a GPU fault. So let's just skip them.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This should eventually be useful for VK_KHR_performance_query as well.
And in the more near term, for fdperf.
Attempt to not break android build is best-effort and untested.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
When we had one gen supporting performance counters, it made sense to
have these builder macros in the .c file with the table. But time has
come to de-duplicate.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This passes the piglit CL builtin-ulong-clz-1.0.generated.cl
test.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
This adds the option to lower 64-bit ufind_msb opcodes.
v2: use split_x/y removes component loops (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This was apparently missed in 67b32190f3, which added support
for ARB_shading_language_include to #line, including the 'path'
field for the location.
Fixes crashes in CTS with all drivers as they attempt to access
an uninitialized path string during parsing.
Fixes: 67b32190f3 ("glsl: add ARB_shading_language_include support to #line")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2132
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jose Maria Casanova <jmcasanova@igalia.com>
This happens when mesa is built with only swrast. The default
driver being kmsro and the default driconf file being v3d,
it's NULL and then strdup crashes.
This fixes a crash with piglit spec/egl_mesa_query_driver/conformance.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Enough trial and error ... just think even *more* Midgard about where
this field might be!
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
also make 8 and 16 compoments invalid. We will enable that later again
when we actually support it.
v2: fix validation of nir_intrinsic_instr::num_components
correct validation of instr->num_components
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This reverts commit 52c7df1643. The pass,
while clearly useful for some shaders, has at least three bugs that I
was able to find fairly quickly:
1. It doesn't work for type-converting MOVs because f > 0 is not the
same as f2i(f) > 0
2. CSEL is a 3src instruction and only supports one source type; it
doesn't take this into account and tries to create instructions
which do a F compare and a D select. This is especially nasty to
debug because you don't see that in the dumped assembly because we
don't properly assert that types are the same in codegen.
3. While you can handle 2, in theory, by reinterpreting types, you
can't do that in the presence of source modifiers. This pass
doesn't even attempt to detect that.
Those are just the ones I found with the one almost trival shader I was
debugging. There very likely may be more and. Best thing to do for now
is just shut it off until someone has the time to figure out how to do
this properly and write tests to ensure it's correct.
Fixes: 3cb085e6d61a "i965/fs: Merge CMP and SEL into CSEL on Gen8+"
Reviewed-by: Brian Paul <brianp@vmware.com>