Commit Graph

2557 Commits

Author SHA1 Message Date
Jason Ekstrand 3d2b157e23 i965/fs: Use UW types when using V immediates
Gen 10 has a strange hardware bug involving V immediates with W types.
It appears that a mov(8) g2<1>W 0x76543210V will actually result in g2
getting the value {3, 2, 1, 0, 3, 2, 1, 0}.  In particular, the bottom
four nibbles are repeated instead of the top four being taken.  (A mov
of 0x00003210V yields the same result.)  This bug does not appear in any
hardware documentation as far as we can tell and the simulator does not
implement the bug either.

Commit 6132992cdb was mostly a no-op
except that it changed the type of the subgroup invocation from UW to W
and caused us to tickle this bug with basically every compute shader
that uses any sort of invocation ID (which is most of them).  This is
also potentially an issue for geometry shader input pulls and SampleID
setup.  The easy solution is just to change the few places where we use
a vector integer immediate with a W type to use a UW type.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Fixes: 6132992cdb
2018-01-11 14:31:38 -08:00
Matt Turner c0ef14f5b1 Revert "Revert "i965/fs: Use align1 mode on ternary instructions on Gen10+""
This reverts commit 2d04572038.

Acked-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-01-11 10:11:59 -08:00
Matt Turner 01ebfbb67a i965/fs: Add/use functions to convert to 3src_align1 vstride/hstride
Some cases weren't handled, such as stride 4 which is needed for 64-bit
operations. Presumably fixes the assertion failure mentioned in commit
2d04572038 (Revert "i965/fs: Use align1 mode on ternary instructions
on Gen10+") but who can really say since the commit neglected to list
any of them!

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-01-11 10:11:59 -08:00
Alex Smith 4fd85617c1 anv: Make sure state on primary is correct after CmdExecuteCommands
After executing a secondary command buffer, we need to update certain
state on the primary command buffer to reflect changes by the secondary.
Otherwise subsequent commands may not have the correct state set.

This fixes various issues (rendering errors, GPU hangs) seen after
executing secondary command buffers in some cases.

v2 (Jason Ekstrand):
 - Reset to invalid values instead of pulling from the secondary
 - Change the comment to be more descriptive

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
2018-01-11 18:11:08 +00:00
Andres Gomez a1901d092c anv: Import mako templates only during execution of anv_extensions
anv_extensions usage from anv_icd was bringing the unwanted dependency
of mako templates for the latter. We don't want that since it will
force the dependency even for distributable tarballs which was not
needed until now.

Jason suggested this approach.

v2: Patch simplification (Jason).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104551
Fixes: 0ab04ba979 ("anv: Use python to generate ICD json files")
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-11 14:44:03 +02:00
Samuel Iglesias Gonsálvez c0816389c2 anv: fix maxDescriptorSet* limits
"The maxDescriptorSet* limit is n times the corresponding
maxPerStageDescriptor* limit, where n is the number of shader stages
supported by the VkPhysicalDevice. If all shader stages are supported,
n = 6 (vertex, tessellation control, tessellation evaluation,
geometry, fragment, compute)."

Fixes:

dEQP-VK.api.info.device.properties

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-11 07:00:42 +01:00
Iago Toral Quiroga 4317c848b9 i965/nir: add a helper to lower gl_PatchVerticesIn to a uniform
v2: do not try to handle it as a system value directly for the SPIR-V
    path. In GL we rather handle it as a uniform like we do for the
    GLSL path (Jason).

v3:
  - Remove the uniform variable, it is alwats -1 now (Jason)
  - Also do the lowering for the TessEval stage (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-10 08:21:02 +01:00
Kenneth Graunke 28c2d0d80b genxml: Add missing INSTDONE_1 bits on Gen7.5+.
This will make aubinator_error_decode decode them properly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-01-09 10:13:53 -08:00
Kenneth Graunke 8eadc2fb8f intel: Apply Geminilake "Barrier Mode" workaround.
Apparently, Geminilake requires you to whack a chicken bit to select
either compute or tessellation mode for barriers.  The recommendation
is to switch between them at PIPELINE_SELECT time.

We may not need to do this all the time, but I don't know that it hurts
either.  PIPELINE_SELECT is already a pretty giant stall.

This appears to fix hangs in tessellation control shaders with barriers
on Geminilake.  Note that this requires a corresponding kernel change,

    drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake.

in order for the register write to actually happen.  Without an updated
kernel, this register write will be noop'd and the fix will not work.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2018-01-09 10:13:33 -08:00
Scott D Phillips 42f421cbbf aubinator: add support for aubinating memtrace aubs
Memtrace aubs are similar to classic aubs, with the major
difference being how command submission is serialized (as register
writes instead of a high-level submit message). Some internal
tools generate or consume only memtrace aubs.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2018-01-08 21:11:11 -08:00
Scott D Phillips 8cdf5bd292 aubinator: extract aubinator_init() out of the header handler function
A later patch will use the aubinator_init() function from the
memtrace aub header handler.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2018-01-08 21:11:11 -08:00
Scott D Phillips 4f0a2ff4c1 aubinator: honor --color option when printing the header
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2018-01-08 21:11:11 -08:00
Alex Smith 0d8b9c529c anv: Allow PMA optimization to be enabled in secondary command buffers
This was never enabled in secondary buffers because hiz_enabled was
never set to true for those.

If the app provides a framebuffer in the inheritance info when beginning
a secondary buffer, we can determine if HiZ is enabled and therefore
allow the PMA optimization to be enabled within the command buffer.

This improves performance by ~13% on an internal benchmark on Skylake.

v2: Use anv_cmd_buffer_get_depth_stencil_view().

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-08 09:31:17 +00:00
Alex Smith 12f4e00b69 anv: Take write mask into account in has_color_buffer_write_enabled
If we have a color attachment, but its writes are masked, this would
have still returned true. This is inconsistent with how HasWriteableRT
in 3DSTATE_PS_BLEND is set, which does take the mask into account.

This could lead to PixelShaderHasUAV not being set in 3DSTATE_PS_EXTRA
if the fragment shader does use UAVs, meaning the fragment shader may
not be invoked because HasWriteableRT is false. Specifically, this was
seen to occur when the shader also enables early fragment tests: the
fragment shader was not invoked despite passing depth/stencil.

Fix by taking the color write mask into account in this function. This
is consistent with how things are done on i965.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-01-05 15:36:22 +00:00
Alex Smith 00a81e9909 anv: Add missing unlock in anv_scratch_pool_alloc
Fixes hangs seen due to the lock not being released here.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-04 14:54:02 +00:00
Kenneth Graunke 74e1d6e20c i965: Drop support for the legacy SNORM -> Float equation.
Older OpenGL defines two equations for converting from signed-normalized
to floating point data.  These are:

    f = (2c + 1)/(2^b - 1)                (equation 2.2)
    f = max{c/2^(b-1) - 1), -1.0}         (equation 2.3)

Both OpenGL 4.2+ and OpenGL ES 3.0+ mandate that equation 2.3 is to be
used in all scenarios, and remove equation 2.2.  DirectX uses equation
2.3 as well.  Intel hardware only supports equation 2.3, so Gen7.5+
systems that use the vertex fetcher hardware to do the conversions
always get formula 2.3.

This can make a big difference for 10-10-10-2 formats - the 2-bit value
can represent 0 with equation 2.3, and cannot with equation 2.2.

Ivybridge and older were using equation 2.2 for OpenGL, and 2.3 for ES.
Now that Ivybridge supports OpenGL 4.2, this is wrong - we need to use
the new rules, at least in core profile.  That would leave Gen4-6 doing
something different than all other hardware, which seems...lame.

With context version promotion, applications that requested a pre-4.2
context may get promoted to 4.2, and thus get the new rules.  Zero cases
have been reported of this being a problem.  However, we've received a
report that following the old rules breaks expectations.  SuperTuxKart
apparently renders the cars red when following equation 2.2, and works
correctly when following equation 2.3:

https://github.com/supertuxkart/stk-code/issues/2885#issuecomment-353858405

So, this patch deletes the legacy equation 2.2 support entirely, making
all hardware and APIs consistently use the new equation 2.3 rules.

If we ever find an application that truly requires the old formula, then
we'd likely want that application to work on modern hardware, too.  We'd
likely restore this support as a driconf option.  Until then, drop it.

This commit will regress Piglit's draw-vertices-2101010 test on
pre-Haswell without the corresponding Piglit patch to accept either
formula (commit 35daaa1695ea01eb85bc02f9be9b6ebd1a7113a1):

    draw-vertices-2101010: Accept either SNORM conversion formula.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
2018-01-02 16:51:42 -08:00
Kenneth Graunke a1afef8de0 i965: Combine {VS,FS}_OPCODE_GET_BUFFER_SIZE opcodes.
These are the same, we don't need a separate opcode enum per backend.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-30 20:30:34 -08:00
Jason Ekstrand 967d238c69 anv/device: Mark all state buffers as needing capture
Previously, we were flagging the instruction state buffer for capture
but not surface state or dynamic state.  We want those captured too.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-28 10:39:04 -08:00
Jason Ekstrand 69fa3fb77f intel/aubinator: Gracefully handle dynamic state not being available
Some older versions of the Vulkan driver didn't properly tag dynamic
state as needing to be captured.  Also, this prevents crashes when
looking at dumps on older kernels.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-28 10:39:04 -08:00
Jason Ekstrand a92d52c3c1 intel/aubinator: Free section data last
We were walking the sections, printing the batches, and then freeing
them in one pass.  If the batch happens to reference any earlier
sections (which it almost certainly will since it's at the end), we will
access freed memory.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-28 10:39:04 -08:00
Anuj Phogat 2d04572038 Revert "i965/fs: Use align1 mode on ternary instructions on Gen10+"
This reverts commit 9cd60fce9c.
Above commit caused 2000+ piglit tests to assert fail. Disabling
the align1 mode on gen10 for now to avoid failures.

Cc: Matt Turner <mattst88@gmail.com>
Cc: Rafael Antognolli <rafael.antognolli@intel.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-12-22 16:40:40 -08:00
Francisco Jerez b3e3cb9901 intel/fs: Initialize fs_visitor::grf_used on construction.
This should shut up some Valgrind errors during pre-regalloc
scheduling.  The errors were harmless since they could only have led
to the estimation of the bank conflict penalty of an instruction
pre-regalloc, which is inaccurate at that point of the program
compilation, but no less accurate than the intended "return 0"
fall-back path.  The scheduling pass is normally re-run after regalloc
with a well-defined grf_used value and accurate bank conflict
information.

Fixes: acf98ff933 "intel/fs: Teach instruction scheduler about GRF bank conflict cycles."
Reported-by: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-12-21 15:20:17 -08:00
Francisco Jerez 1aa79d5ed5 intel/fs/bank_conflicts: Use posix_memalign() instead of overaligned new to obtain vector storage.
The weight_vector_type constructor was inadvertently assuming C++17
semantics of the new operator applied on a type with alignment
requirement greater than the largest fundamental alignment.
Unfortunately on earlier C++ dialects the implementation was allowed
to raise an allocation failure when the alignment requirement of the
allocated type was unsupported, in an implementation-defined fashion.
It's expected that a C++ implementation recent enough to implement
P0035R4 would have honored allocation requests for such over-aligned
types even if the C++17 dialect wasn't active, which is likely the
reason why this problem wasn't caught by our CI system.

A more elegant fix would involve wrapping the __SSE2__ block in a
'__cpp_aligned_new >= 201606' preprocessor conditional and continue
taking advantage of the language feature, but that would yield lower
compile-time performance on old compilers not implementing it
(e.g. GCC versions older than 7.0).

Fixes: af2c320190 "intel/fs: Implement GRF bank conflict mitigation pass."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104226
Reported-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-12-21 15:19:59 -08:00
Samuel Iglesias Gonsálvez a31f0c4a36 anv: disallow VK_REMAINING_ARRAY_LAYERS in vkCmdClearAttachments()
Vulkan spec doesn't specify that VK_REMAINING_ARRAY_LAYERS is allowed
in the passed VkClearRect struct.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-20 06:55:41 +01:00
Rafael Antognolli 85789831b4 intel/compiler/gen10: Disable push constants.
We still have gpu hangs on Cannonlake when using push constants, so
disable them for now until we have a proper fix for these hangs.

v2: Add warning message when creating context too.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2017-12-19 12:32:24 -08:00
Bas Nieuwenhuizen 6d9849d63e anv: Remove unused variable.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-17 14:53:46 +01:00
Kenneth Graunke 02720f8d24 isl: Don't require VALIGN_2 for R32G32B32_FLOAT on Haswell.
According to the RENDER_SURFACE_STATE internal documentation, the
R32G32B32_FLOAT restriction is marked "IVB" only.  We choose to apply
it to Ivybridge and Baytrail, but not Haswell.

Apparently fixes KHR-GL46.texture_size_promotion.functional on Haswell.

Changes these tests from crashing to skipping on Haswell:
- KHR-GL46.direct_state_access.textures_storage_multisample_2d_rgb32f
- KHR-GL46.direct_state_access.textures_storage_multisample_3d_rgb32f

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-15 14:00:09 -08:00
Jason Ekstrand 4b8c9ea46b intel/tools: Convert aubinator over to the common framework
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:24 -08:00
Jason Ekstrand 35f9c27be3 intel/batch-decoder: Decode registers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:22 -08:00
Jason Ekstrand 81e4ecbc19 intel/batch-decoder: Decode dynamic state
Unfortunately, in aubinator and aubinator_error_decode we don't always
know how many of a given state we have, so we must guess.  One day,
we'll come up with a way to annotate the batch to solve this problem.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:20 -08:00
Jason Ekstrand 4ac2ee9001 intel/batch-decoder: Decode constants, binding tables, and samplers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:18 -08:00
Jason Ekstrand d374423eab intel/tools: Switch aubinator_error_decode over to the gen_print_batch
The shared framework can now do everything that aubinator_error_decode
ever did and more.  It's time to make the switch.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:16 -08:00
Jason Ekstrand c86671c438 intel/batch-decoder: Decode graphics shaders
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:15 -08:00
Jason Ekstrand d4081fb778 intel/batch-decoder: Decode vertex and index buffers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:13 -08:00
Jason Ekstrand e27ec208ed intel/batch-decoder: Decode MEDIA_INTERFACE_DESCRIPTOR_LOAD
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:12 -08:00
Jason Ekstrand be20043d00 intel/tools: Add the start of a generic batch decoder
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:10 -08:00
Jason Ekstrand 4cb96fbd91 intel/decoder: Expose the raw field value in the iterator
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:09 -08:00
Jason Ekstrand 79269e8f4b intel/disasm: Take a devinfo in gen_disasm_create
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:06 -08:00
Jason Ekstrand a7ae72032f intel/decoder: Take a bit offset in gen_print_group
Previously, if a group was nested in another group such that it didn't
start on a dword boundary, we would decode it as if it started at the
start of its first dword.  This changes things to work even more in
terms of bits so that we can properly decode these structs.  This
affects MOCS, attribute swizzles, and several other things.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:04 -08:00
Jason Ekstrand dca8f466ee intel/decoder: Stop rounding down to the nearest dword
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:03 -08:00
Jason Ekstrand f264640693 intel/decoder: Convert the iterator to work entirely in bits
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:01 -08:00
Jason Ekstrand ada705b671 intel/decoder: Drop gen_field_decode helper
It's unused

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:26:44 -08:00
Francisco Jerez acab52f520 intel/fs/bank_conflicts: Don't touch Gen7 MRF hack registers.
Fixes: af2c320190 "intel/fs: Implement GRF bank conflict mitigation pass."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104199
Reported-by: Darius Spitznagel <d.spitznagel@goodbytez.de>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-12-12 12:05:45 -08:00
Samuel Iglesias Gonsálvez ba4bb0838b anv: fix bug when using component qualifier in FS outputs
We can write to the same output but in different components, like
in this example:

layout(location = 0, component = 0) out ivec2 dEQP_FragColor_0;
layout(location = 0, component = 2) out ivec2 dEQP_FragColor_1;

Therefore, they are not two different outputs but only one.

Fixes:

dEQP-VK.glsl.440.linkage.varying.component.frag_out.*

v3:
- Remove FRAG_RESULT_MAX.
- Add const and use sizeof (Ian).
- Do three-pass to set properly the locations of fragment
  outputs when having arrays (Jason).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-12 07:24:55 +01:00
Jason Ekstrand 4c7af87fb9 anv: Enable UBO pushing
Push constants on Intel hardware are significantly more performant than
pull constants.  Since most Vulkan applications don't actively use push
constants on Vulkan or at least don't use it heavily, we're pulling way
more than we should be.  By enabling pushing chunks of UBOs we can get
rid of a lot of those pulls.

On my SKL GT4e, this improves the performance of Dota 2 and Talos by
around 2.5% and improves Aztec Ruins by around 2%.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-12-08 15:43:26 -08:00
Jason Ekstrand f1ce0b905a i965/fs: Handle !supports_pull_constants and push UBOs properly
In Vulkan, we don't support classic pull constants and everything the
client asks us to push, we push.  However, for pushed UBOs, we still
want to fall back to conventional pulls if we run out of space.
2017-12-08 15:43:25 -08:00
Jason Ekstrand 8d34077182 anv/device: Increase the UBO alignment requirement to 32
Push constants work in terms of 32-byte chunks so if we want to be able
to push UBOs, every thing needs to be 32-byte aligned.  Currently, we
only require 16-byte which is too small.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-12-08 15:43:25 -08:00
Jason Ekstrand 2f9eb045f3 anv/cmd_buffer: Add support for pushing UBO ranges
In order to do this we have to modify push constant set up to handle
ranges.  We also have to tweak the way we handle dirty bits a bit so
that we re-push whenever a descriptor set changes.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-12-08 15:43:25 -08:00
Jason Ekstrand 0c879b62b0 anv/cmd_buffer: Add some stage asserts
There are several places where we look up opcodes in an array of stages.
Assert that the we don't end up going out-of-bounds.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-12-08 15:43:25 -08:00
Jason Ekstrand 1968cd07a2 anv/cmd_buffer: Add some helpers for working with descriptor sets
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-12-08 15:43:25 -08:00