Almost a trivial change, it boils down to renaming a few identifiers
so their names still make sense for opaque types other than sampler.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fix the linker to deal with intrinsic functions which are undefined
all the way down to the driver back-end, and introduce intrinsic
definition helpers in the built-in generator.
We still need to figure out what kind of interface we want for drivers
to communicate to the GLSL front-end which of the supported intrinsics
should use a default GLSL implementation and which should use a
hardware-specific override. As there's no default GLSL implementation
for atomic ops, this seems like something we can worry about later on.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Define local helper function to generate ir_call nodes in the
builtin generator.
And use it to forbid comparisons of opaque operands. According to the
GL 4.2 specification:
> Except for array indexing, structure member selection, and
> parentheses, opaque variables are not allowed to be operands in
> expressions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Fix GLSL version in which the type became available. Add
contains_atomic() convenience method. Split off atomic counter
comparison error checking to a separate patch that will handle all
opaque types. Include new ir_variable fields for atomic types.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch implements the common support code required for the
ARB_shader_atomic_counters extension. It defines the necessary data
structures for tracking atomic counter buffer objects (from now on
"ABOs") associated with some specific context or shader program, it
implements support for binding buffers to an ABO binding point and
querying the existing atomic counters and buffers declared by GLSL
shaders.
v2: Fix extension checks. Drop unused MAX_ATOMIC_BUFFERS constant.
Acked-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Add XML file for the dispatch code generator, update the
dispatch_sanity test and add stub definition for the new entry point.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
These ralloc contexts belong to a specific object and are being
deallocated manually from the class destructor. Now that we've hooked
up destructors to ralloc there's no reason for them to be children of
any other context, and doing so might to lead to double frees under
some circumstances. The class destructor has all the responsibility
of freeing class memory resources now.
This patch makes sure that class destructors are called as they should
be when a C++ object allocated by ralloc is released.
Based on a previous patch by Kenneth Graunke, but it doesn't exhibit
the ~0.8% performance regression in shader compilation times because
we now use the HAS_TRIVIAL_DESTRUCTOR() macro to detect the typical
case where the indirect function call can be avoided because the
object's destructor doesn't need to do anything.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Only implemented on GCC and Clang for now. Other compilers use a
dummy implementation that always returns false, which should be a safe
[but slightly inefficient] assumption in all cases.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will let us use strcasecmp() from anywhere inside Mesa without
having to worry about the fact that it doesn't exist in MSVC.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The layer coming from GS needs to be clamped (not sure if that's actually
the correct error behavior but we need something) as the number can be higher
than the amount of layers in the fb. However, this code was using the layer
calculation from the scene, and this was actually calculated in
lp_scene_begin_rasterization() hence too late (so setup was using the value
from the _previous_ scene or just zero if it was the first scene).
Since the value is used in both rasterization and setup, move calculation up
to lp_scene_begin_binning() though it's a bit more inconvenient to calculate
there. (Theoretically could move _all_ code which was in
lp_scene_begin_rasterization() to there, because ever since we got rid of
swizzled render/depth buffers our "map" functions preparing the fb data for
render don't actually change the data in there at all, but it feels like
it would be a hack.)
v2: improve comments
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Before we were only checking the st->vertex_array_out_of_memory flag
after updating array state. But if there's two consecutive glDrawArrays
calls and the first one is skipped because of OOM, the second one should
be skipped too.
Cc: 9.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Orbital Explorer was generating a 4000 instruction geometry shader, which
was taking 275 trips through dead code elimination and register
coalescing, each of which updated live variables to get its work done, and
invalidated those live variables afterwards.
By using bitfields instead of bools (reducing the working set size by a
factor of 8) in live variables analysis, it drops from 88% of the profile
to 57%, and reduces overall runtime from I-got-bored-and-killed-it (Paul
says 3+ minutes) to 10.5 seconds.
Compare to f179f419d1 on the FS side.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This prevents unnecessary (and wrong) register allocation in the
scheduler for preloaded values in fixed registers.
Fixes interpolation-mixed.shader_test on rv770
(and probably on all other pre-evergreen chips).
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
I noticed this in a shader in Unigine Heaven that was spilling. While it
doesn't really reduce register pressure, it shaves a few instructions
anyway (7955 -> 7882).
v2: Fix turning "0 >> x" into "x" instead of "0" (caught by Erik
Faye-Lund).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
While ir_builder is slightly less efficient, we're only increasing the
work when there's actual optimization being done, and it's way more
readable code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Matt and I had each screwed up these common required patterns recently, in
ways that wouldn't have been noticed for a long time if not for code
review. Just enforce it in the caller so that we don't rely on code
review catching these bugs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There is nothing in the OpenGL specification which prevents the user from
calling glGenQueries to generate a new query object while another object is
active. Neither is there anything in the Mesa implementation which prevents
this. So remove the INVALID_OPERATION errors in this case.
Similarly, it is explicitly allowed by the OpenGL specification to delete an
active query, so remove the assertion for that case, replacing it with the
necesssary state updates to end the query, (clear the bindpt pointer and call
into the driver's EndQuery hook).
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
The normal drawing path does this, and it's necessary on Ivybridge,
so let's try it on Sandybridge too. It's not explicitly documented
as necessary, but might help with hangs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
From the documentation:
"[DevIVB] 3DSTATE_DEPTH_BUFFER must always be programmed along with the
other Depth/Stencil state commands(i.e. 3DSTATE_CLEAR_PARAMS,
3DSTATE_STENCIL_BUFFER, or 3DSTATE_HIER_DEPTH_BUFFER)."
We normally do this, but BLORP was failing to do so in the case where it
disables depth.
Not observed to fix anything yet.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
For some reason, we put the flush in the caller, rather than just before
emitting the packet. This is more than a cosmetic problem: BLORP calls
gen6_emit_3dstate_multisample() directly, and so it missed the flush.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
From the comments above intel_emit_post_sync_nonzero_flush:
"[DevSNB-C+{W/A}] Before any depth stall flush (including those
produced by non-pipelined state commands), software needs to first
send a PIPE_CONTROL with no bits set except Post-Sync Operation != 0."
This suggests that every non-pipelined (0x79xx) command needs a
post-sync non-zero flush before it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Otherwise the gen6 w/a in the kernel won't kick in and the write will
land nowhere.
Inspired by a patch Ken pointed me at which had the same issue (but
isn't yet merged and also for a gen7+ feature). An audit of the entire
driver didn't reveal any other case than the one in in the write_reg
helper used by the gen6 queryobj code.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
The main purpose of this patch is to increase readability of
the array code by introducing is_unsized_array() to glsl_types.
Some redundent is_array() checks are also removed, and small number
of other related clean ups.
The introduction of is_unsized_array() should also make the
ARB_arrays_of_arrays code simpler and more readable when it arrives.
V2: Also replace code that checks for unsized arrays directly with the
length variable
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
v3 (Paul Berry <stereotype441@gmail.com>): clean up formatting.
Separate whitespace cleanups to their own patch.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
When a geometry shader is present, the fragment shader gl_PrimitiveID
input acts like an ordinary varying, receiving data from the gs
gl_PrimitiveID output. When there's no geometry shader, we have to
ask the fixed function SF hardware to provide the primitive ID to the
fragment shader instead.
Previously, the SF setup code would handle this situation by
recognizing that the FS gl_PrimitiveID input didn't match to any VS
output; since normally an FS input with no corresponding VS output
leads to undefined data, the SF setup code used to just arbitrarily
assign it to receive data from attribute 0.
This patch changes the SF setup code so that instead of arbitrarily
using attribute 0, it assigns the unmatched FS input to receive
gl_PrimitiveID. In the case where the FS input really is
gl_PrimitiveID, this produces the intended result. In all other
cases, no harm is done since GL specifies that the behaviour is
undefined.
Fixes piglit test primitive-id-no-gs.
v2: If an attribute is already being overridden with point
coordinates, don't try to also override it with gl_PrimitiveID. This
is necessary to avoid regressing piglit tests such as
shaders/glsl-fs-pointcoord.
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Actually implement interop between the gallium
state tracker and the VDPAU backend.
v3: Make it also available in non legacy contexts,
fix video buffer sharing.
v4: deny interop if we don't have the same screen object
v5: rebased on upstream changes
v6: implemented VDPAUGetSurfaceivNV, improved error handling,
unregister all surfaces in VDPAUFiniNV
v7: squash merge with Mareks changes
Signed-off-by: Christian König <christian.koenig@amd.com>
ir_txf expects an ivec* coordinate, and may be larger than ivec2;
shuffle things around so that this will work.
V2: Fix style nits, use ir_builder
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It turns out that nonzero offsets with gsampler2DRect don't work -- they
just return garbage. Work around this by folding the offset into the
coord.
Done as an IR pass rather than yet another hack in the visitors because
it's clear what's going on this way. Can possibly reuse this to replace
the existing txf coord+offset hacks.
V2: Use ir_builder
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>