This is a little bit like the mprotect-based fencing I've experimented
with, but it's simple and low overhead. The downside is that only catches
writes, not reads.
It didn't catch any bad writes on a current piglit run, but may be useful
in the future.
Move scratch_size out of ilo_state_shader_kernel_info and
ilo_state_compute_interface_info. A scratch space is shared by all
kernels/interfaces. Update builder to emit relocs for scratch bos.
Location has never been able to be a negative value because it has
always been validated in the parser.
Also the linker doesn't check for negatives like the comment claims.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
virgl/vtest is a swrast driver that allows the
virgl acceleration to be tested without having
a virtual machine.
The backend has a unix socket server that
this connects to.
This is run by setting
LIBGL_ALWAYS_SOFTWARE=y
GALLIUM_DRIVER=virpipe
In this mode all renderering is sent over
a socket to the remote renderer, and the
results are readback and copies to the screen
using drisw. This works well enough to develop
new features and to help debug.
Signed-off-by: Dave Airlie <airlied@redhat.com>
virgl is the 3D acceleration backend for the
virtio-gpu shipping with qemu.
The 3D acceleration is designed around gallium
and TGSI as the virtualisation layer. The backend
renderer translates the virgl interface into
OpenGL currently.
This is the initial import of the driver to mesa.
The kernel driver portions are lined up for drm-next.
Currently this driver supports up to GL3.3 and some
misc extensions if the host driver exposes it. It is
planned to iterate the virgl API to new GL levels
as mesa host drivers gain features.
v2: fix resource tracking across flushes to avoid
->bind hack in mapping.
consolidate mapping and waiting code for transfers.
use u_range for dirt tracking.
handle larger shaders in protocol.
include virtgpu_drm.h in mesa for now.
add translation layer for gallium tgsi to virgl tgsi.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is used to detect error in virgl if we overflow the shader
dumping buffers.
v2: return a bool.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support to the parser to accept hex values as floats,
and then adds support to the dumper to allow the user to select
to dump float as 32-bit hex numbers.
This is required to get accurate values for virgl use of TGSI.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of just stomping on the mode, it now validates asserts that the
previously set mode is correct and only changes it if needed. We need to
do this because, in geometry shaders, there are some builtins that can be
either an input or an output depending on context. We can get that
information from the SPIR-V source but we can't throw it away.
On ultra high resolution modes, the preemptive flush flag can be
set midway through command submission, a condition that cannot be
recovered from a flush-retry, causing rendering artifacts.
This patch prevents a preemtive_flush until a draw has been
emitted.
Signed-off-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The svga device doesn't directly support quads, quad strips or polygons
so we have to convert those types to indexed triangle lists. But we
can sometimes avoid that if we're drawing flat/constant-colored prims
and we don't have to worry about provoking vertex.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Provoking vertex comes into play when doing flat shading. But if we know
that all fragments in a primitive are the same color, the provoking vertex
doesn't matter. Check for that case and use whichever provoking vertex
convention is supported by the device.
This avoids generating an index buffer to do the PV conversion.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Examine the fragment shader to try to detect TGSI shaders which use
"MOV OUT[0], CONST[i]" to write a constant value for the fragment color.
In this case, all fragments will have the same color (unless blending is
enabled).
This is a common case for OpenGL code such as: glColor(), glBegin(),
glVertex(), ..., glEnd() when lighting/fog/etc are disabled. In this
case, the Mesa/gallium state tracker actually generates a simple
"MOV OUT[0], CONST[i]" fragment shader.
This will be used by the next commit to avoid provoking vertex conversion
(creating/rewriting an index buffer) when drawing flat-shaded primitives.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The st_RasterPos() function goes to great pains to implement the
rasterpos transformation. It basically uses gallium's draw module to
execute the vertex shader to draw a point, then capture that point's
attributes.
But glRasterPos isn't typically used with a vertex shader so we can
usually use the old/fixed-function implementation which is a lot simpler
and faster.
This can add up for legacy apps that make a lot of calls to glRasterPos.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We'll remove it from the tnl module next. By lifting this code into core
Mesa we can use it from the gallium state tracker.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Instead of calling memcpy() 'n' times, we can do it all at once since
the source and dest regions are all contiguous.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The section for UVD 2 and older was not updated
when HEVC support was added. Reported by Kano
on irc.
v2: integrate the UVD2 and older checks into the
main switch statement.
v3: handle encode checking as well. Encode is
already checked in the top case statement, so
drop encode checks in the lower case statement.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: mesa-stable@lists.freedesktop.org
v2: externalize pred_ctrl_align16 from brw_disasm.c instead of adding
a copy on brw_vec4.c, as suggested by Matt Turner
Reviewed-by: Matt Turner <mattst88@gmail.com>
The complete way to do this would be parse INTEL_DEBUG and
print the output if DEBUG_VS (or a new one) is present
(see intel_debug.c).
But that seems like an overkill for the unit tests, that
after all, the most common use case is being run when
calling make check.
v2: use the same idea for the fs counterpart too, as suggested by
Matt Turner
Reviewed-by: Matt Turner <mattst88@gmail.com>
This include the same tests coming from test_fs_cmod_propagation, (non
vector glsl types included) plus some new with vec4 types, inspired on
the regressions found while the optimization was a work in progress.
Additionally, the check of number of instructions after the
optimization was changed from EXPECT_EQ to ASSERT_EQ. This was done to
avoid a crash on failing tests that expected no optimization, as after
checking the number of instructions, there were some checks related to
this last instruction opcode/conditional mod.
v2: update tests after Matt Turner's review of the optimization pass
v3: tweaks on the tests (mostly on the comments), after Matt Turner's
review
Reviewed-by: Matt Turner <mattst88@gmail.com>
vec4 port of fs_cmod_propagation.
Shader-db results (no vec4 grepping):
total instructions in shared programs: 6240413 -> 6235841 (-0.07%)
instructions in affected programs: 401933 -> 397361 (-1.14%)
total loops in shared programs: 1979 -> 1979 (0.00%)
helped: 2265
HURT: 0
v2: remove extra space and combine two if blocks, as suggested by
Matt Turner
v3: add condition check to bail out if current inst and inst being
scanned has different writemask, as pointed by Matt Turner
v3: updated shader-db numbers
v4: remove block from foreach_inst_in_block_*_starting_from after
commit 801f151917
Reviewed-by: Matt Turner <mattst88@gmail.com>
vec4_live_variables tracks now each flag channel independently, so
vec4_dead_code_eliminate can update the writemask of null registers,
based on which component are alive at the moment. This would allow
vec4_cmod_propagation to optimize out several movs involving null
registers.
v2: added support to track each flag channel independently at vec4
live_variables, as v1 assumed that it was already doing it, as
pointed by Francisco Jerez
v3: general cleaningn after Matt Turner's review
Reviewed-by: Matt Turner <mattst88@gmail.com>
Gen8+ lifted the register region restriction that an instruction whose
destination spans two registers must have sources that also span two
registers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The AND and SHR produce a scalar value that we had been replicating
across $dispatch_width channels. The immediate MOV produces only four
useful channels of data.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Not a functional difference, but register is loaded with a signed
immediate (V) and added to a signed type (D) producing a signed result
(D).
Also change the type of g0 to allow for compaction.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We implement textureQueryLevels (which takes no arguments, save the
sampler) using the resinfo message (which takes an argument of LOD).
Without initializing it, we'd generate a MOV from the null register to
load the LOD argument.
Essentially the same logic applies to texture. A vertex shader cannot
compute derivatives and so cannot produce an LOD, so TXL with an LOD of
0.0 is used.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
An untyped surface read is volatile because it might be affected by a
write.
In the ES31-CTS.compute_shader.resources-max test, two back to back
read/modify/writes of an SSBO variable looked something like this:
r1 = untyped_surface_read(ssbo_float)
r2 = r1 + 1
untyped_surface_write(ssbo_float, r2)
r3 = untyped_surface_read(ssbo_float)
r4 = r3 + 1
untyped_surface_write(ssbo_float, r4)
And after CSE, we had:
r1 = untyped_surface_read(ssbo_float)
r2 = r1 + 1
untyped_surface_write(ssbo_float, r2)
r4 = r1 + 1
untyped_surface_write(ssbo_float, r4)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>