AlexIndustrial/mesa

Author	SHA1	Message	Date
Jason Ekstrand	3d2b157e23	i965/fs: Use UW types when using V immediates Gen 10 has a strange hardware bug involving V immediates with W types. It appears that a mov(8) g2<1>W 0x76543210V will actually result in g2 getting the value {3, 2, 1, 0, 3, 2, 1, 0}. In particular, the bottom four nibbles are repeated instead of the top four being taken. (A mov of 0x00003210V yields the same result.) This bug does not appear in any hardware documentation as far as we can tell and the simulator does not implement the bug either. Commit `6132992cdb` was mostly a no-op except that it changed the type of the subgroup invocation from UW to W and caused us to tickle this bug with basically every compute shader that uses any sort of invocation ID (which is most of them). This is also potentially an issue for geometry shader input pulls and SampleID setup. The easy solution is just to change the few places where we use a vector integer immediate with a W type to use a UW type. Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org Fixes: `6132992cdb`	2018-01-11 14:31:38 -08:00
Matt Turner	c0ef14f5b1	Revert "Revert "i965/fs: Use align1 mode on ternary instructions on Gen10+"" This reverts commit `2d04572038`. Acked-by: Scott D Phillips <scott.d.phillips@intel.com>	2018-01-11 10:11:59 -08:00
Matt Turner	01ebfbb67a	i965/fs: Add/use functions to convert to 3src_align1 vstride/hstride Some cases weren't handled, such as stride 4 which is needed for 64-bit operations. Presumably fixes the assertion failure mentioned in commit `2d04572038` (Revert "i965/fs: Use align1 mode on ternary instructions on Gen10+") but who can really say since the commit neglected to list any of them! Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2018-01-11 10:11:59 -08:00
Alex Smith	4fd85617c1	anv: Make sure state on primary is correct after CmdExecuteCommands After executing a secondary command buffer, we need to update certain state on the primary command buffer to reflect changes by the secondary. Otherwise subsequent commands may not have the correct state set. This fixes various issues (rendering errors, GPU hangs) seen after executing secondary command buffers in some cases. v2 (Jason Ekstrand): - Reset to invalid values instead of pulling from the secondary - Change the comment to be more descriptive Signed-off-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: mesa-stable@lists.freedesktop.org	2018-01-11 18:11:08 +00:00
Andres Gomez	a1901d092c	anv: Import mako templates only during execution of anv_extensions anv_extensions usage from anv_icd was bringing the unwanted dependency of mako templates for the latter. We don't want that since it will force the dependency even for distributable tarballs which was not needed until now. Jason suggested this approach. v2: Patch simplification (Jason). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104551 Fixes: `0ab04ba979` ("anv: Use python to generate ICD json files") Cc: Jason Ekstrand <jason.ekstrand@intel.com> Cc: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-11 14:44:03 +02:00
Samuel Iglesias Gonsálvez	c0816389c2	anv: fix maxDescriptorSet* limits "The maxDescriptorSet* limit is n times the corresponding maxPerStageDescriptor* limit, where n is the number of shader stages supported by the VkPhysicalDevice. If all shader stages are supported, n = 6 (vertex, tessellation control, tessellation evaluation, geometry, fragment, compute)." Fixes: dEQP-VK.api.info.device.properties Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-11 07:00:42 +01:00
Iago Toral Quiroga	4317c848b9	i965/nir: add a helper to lower gl_PatchVerticesIn to a uniform v2: do not try to handle it as a system value directly for the SPIR-V path. In GL we rather handle it as a uniform like we do for the GLSL path (Jason). v3: - Remove the uniform variable, it is alwats -1 now (Jason) - Also do the lowering for the TessEval stage (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-10 08:21:02 +01:00
Kenneth Graunke	28c2d0d80b	genxml: Add missing INSTDONE_1 bits on Gen7.5+. This will make aubinator_error_decode decode them properly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-01-09 10:13:53 -08:00
Kenneth Graunke	8eadc2fb8f	intel: Apply Geminilake "Barrier Mode" workaround. Apparently, Geminilake requires you to whack a chicken bit to select either compute or tessellation mode for barriers. The recommendation is to switch between them at PIPELINE_SELECT time. We may not need to do this all the time, but I don't know that it hurts either. PIPELINE_SELECT is already a pretty giant stall. This appears to fix hangs in tessellation control shaders with barriers on Geminilake. Note that this requires a corresponding kernel change, drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake. in order for the register write to actually happen. Without an updated kernel, this register write will be noop'd and the fix will not work. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-01-09 10:13:33 -08:00
Scott D Phillips	42f421cbbf	aubinator: add support for aubinating memtrace aubs Memtrace aubs are similar to classic aubs, with the major difference being how command submission is serialized (as register writes instead of a high-level submit message). Some internal tools generate or consume only memtrace aubs. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-01-08 21:11:11 -08:00
Scott D Phillips	8cdf5bd292	aubinator: extract aubinator_init() out of the header handler function A later patch will use the aubinator_init() function from the memtrace aub header handler. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-01-08 21:11:11 -08:00
Scott D Phillips	4f0a2ff4c1	aubinator: honor --color option when printing the header Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-01-08 21:11:11 -08:00
Alex Smith	0d8b9c529c	anv: Allow PMA optimization to be enabled in secondary command buffers This was never enabled in secondary buffers because hiz_enabled was never set to true for those. If the app provides a framebuffer in the inheritance info when beginning a secondary buffer, we can determine if HiZ is enabled and therefore allow the PMA optimization to be enabled within the command buffer. This improves performance by ~13% on an internal benchmark on Skylake. v2: Use anv_cmd_buffer_get_depth_stencil_view(). Signed-off-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-08 09:31:17 +00:00
Alex Smith	12f4e00b69	anv: Take write mask into account in has_color_buffer_write_enabled If we have a color attachment, but its writes are masked, this would have still returned true. This is inconsistent with how HasWriteableRT in 3DSTATE_PS_BLEND is set, which does take the mask into account. This could lead to PixelShaderHasUAV not being set in 3DSTATE_PS_EXTRA if the fragment shader does use UAVs, meaning the fragment shader may not be invoked because HasWriteableRT is false. Specifically, this was seen to occur when the shader also enables early fragment tests: the fragment shader was not invoked despite passing depth/stencil. Fix by taking the color write mask into account in this function. This is consistent with how things are done on i965. Signed-off-by: Alex Smith <asmith@feralinteractive.com> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-01-05 15:36:22 +00:00
Alex Smith	00a81e9909	anv: Add missing unlock in anv_scratch_pool_alloc Fixes hangs seen due to the lock not being released here. Signed-off-by: Alex Smith <asmith@feralinteractive.com> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-04 14:54:02 +00:00
Kenneth Graunke	74e1d6e20c	i965: Drop support for the legacy SNORM -> Float equation. Older OpenGL defines two equations for converting from signed-normalized to floating point data. These are: f = (2c + 1)/(2^b - 1) (equation 2.2) f = max{c/2^(b-1) - 1), -1.0} (equation 2.3) Both OpenGL 4.2+ and OpenGL ES 3.0+ mandate that equation 2.3 is to be used in all scenarios, and remove equation 2.2. DirectX uses equation 2.3 as well. Intel hardware only supports equation 2.3, so Gen7.5+ systems that use the vertex fetcher hardware to do the conversions always get formula 2.3. This can make a big difference for 10-10-10-2 formats - the 2-bit value can represent 0 with equation 2.3, and cannot with equation 2.2. Ivybridge and older were using equation 2.2 for OpenGL, and 2.3 for ES. Now that Ivybridge supports OpenGL 4.2, this is wrong - we need to use the new rules, at least in core profile. That would leave Gen4-6 doing something different than all other hardware, which seems...lame. With context version promotion, applications that requested a pre-4.2 context may get promoted to 4.2, and thus get the new rules. Zero cases have been reported of this being a problem. However, we've received a report that following the old rules breaks expectations. SuperTuxKart apparently renders the cars red when following equation 2.2, and works correctly when following equation 2.3: https://github.com/supertuxkart/stk-code/issues/2885#issuecomment-353858405 So, this patch deletes the legacy equation 2.2 support entirely, making all hardware and APIs consistently use the new equation 2.3 rules. If we ever find an application that truly requires the old formula, then we'd likely want that application to work on modern hardware, too. We'd likely restore this support as a driconf option. Until then, drop it. This commit will regress Piglit's draw-vertices-2101010 test on pre-Haswell without the corresponding Piglit patch to accept either formula (commit 35daaa1695ea01eb85bc02f9be9b6ebd1a7113a1): draw-vertices-2101010: Accept either SNORM conversion formula. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Chris Forbes <chrisforbes@google.com>	2018-01-02 16:51:42 -08:00
Kenneth Graunke	a1afef8de0	i965: Combine {VS,FS}_OPCODE_GET_BUFFER_SIZE opcodes. These are the same, we don't need a separate opcode enum per backend. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-30 20:30:34 -08:00
Jason Ekstrand	967d238c69	anv/device: Mark all state buffers as needing capture Previously, we were flagging the instruction state buffer for capture but not surface state or dynamic state. We want those captured too. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-28 10:39:04 -08:00
Jason Ekstrand	69fa3fb77f	intel/aubinator: Gracefully handle dynamic state not being available Some older versions of the Vulkan driver didn't properly tag dynamic state as needing to be captured. Also, this prevents crashes when looking at dumps on older kernels. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-28 10:39:04 -08:00
Jason Ekstrand	a92d52c3c1	intel/aubinator: Free section data last We were walking the sections, printing the batches, and then freeing them in one pass. If the batch happens to reference any earlier sections (which it almost certainly will since it's at the end), we will access freed memory. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-28 10:39:04 -08:00
Anuj Phogat	2d04572038	Revert "i965/fs: Use align1 mode on ternary instructions on Gen10+" This reverts commit `9cd60fce9c`. Above commit caused 2000+ piglit tests to assert fail. Disabling the align1 mode on gen10 for now to avoid failures. Cc: Matt Turner <mattst88@gmail.com> Cc: Rafael Antognolli <rafael.antognolli@intel.com> Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>	2017-12-22 16:40:40 -08:00
Francisco Jerez	b3e3cb9901	intel/fs: Initialize fs_visitor::grf_used on construction. This should shut up some Valgrind errors during pre-regalloc scheduling. The errors were harmless since they could only have led to the estimation of the bank conflict penalty of an instruction pre-regalloc, which is inaccurate at that point of the program compilation, but no less accurate than the intended "return 0" fall-back path. The scheduling pass is normally re-run after regalloc with a well-defined grf_used value and accurate bank conflict information. Fixes: `acf98ff933` "intel/fs: Teach instruction scheduler about GRF bank conflict cycles." Reported-by: Eero Tamminen <eero.t.tamminen@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2017-12-21 15:20:17 -08:00
Francisco Jerez	1aa79d5ed5	intel/fs/bank_conflicts: Use posix_memalign() instead of overaligned new to obtain vector storage. The weight_vector_type constructor was inadvertently assuming C++17 semantics of the new operator applied on a type with alignment requirement greater than the largest fundamental alignment. Unfortunately on earlier C++ dialects the implementation was allowed to raise an allocation failure when the alignment requirement of the allocated type was unsupported, in an implementation-defined fashion. It's expected that a C++ implementation recent enough to implement P0035R4 would have honored allocation requests for such over-aligned types even if the C++17 dialect wasn't active, which is likely the reason why this problem wasn't caught by our CI system. A more elegant fix would involve wrapping the __SSE2__ block in a '__cpp_aligned_new >= 201606' preprocessor conditional and continue taking advantage of the language feature, but that would yield lower compile-time performance on old compilers not implementing it (e.g. GCC versions older than 7.0). Fixes: `af2c320190` "intel/fs: Implement GRF bank conflict mitigation pass." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104226 Reported-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2017-12-21 15:19:59 -08:00
Samuel Iglesias Gonsálvez	a31f0c4a36	anv: disallow VK_REMAINING_ARRAY_LAYERS in vkCmdClearAttachments() Vulkan spec doesn't specify that VK_REMAINING_ARRAY_LAYERS is allowed in the passed VkClearRect struct. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-20 06:55:41 +01:00
Rafael Antognolli	85789831b4	intel/compiler/gen10: Disable push constants. We still have gpu hangs on Cannonlake when using push constants, so disable them for now until we have a proper fix for these hangs. v2: Add warning message when creating context too. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>	2017-12-19 12:32:24 -08:00
Bas Nieuwenhuizen	6d9849d63e	anv: Remove unused variable. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-17 14:53:46 +01:00
Kenneth Graunke	02720f8d24	isl: Don't require VALIGN_2 for R32G32B32_FLOAT on Haswell. According to the RENDER_SURFACE_STATE internal documentation, the R32G32B32_FLOAT restriction is marked "IVB" only. We choose to apply it to Ivybridge and Baytrail, but not Haswell. Apparently fixes KHR-GL46.texture_size_promotion.functional on Haswell. Changes these tests from crashing to skipping on Haswell: - KHR-GL46.direct_state_access.textures_storage_multisample_2d_rgb32f - KHR-GL46.direct_state_access.textures_storage_multisample_3d_rgb32f Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-15 14:00:09 -08:00
Jason Ekstrand	4b8c9ea46b	intel/tools: Convert aubinator over to the common framework Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:24 -08:00
Jason Ekstrand	35f9c27be3	intel/batch-decoder: Decode registers Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:22 -08:00
Jason Ekstrand	81e4ecbc19	intel/batch-decoder: Decode dynamic state Unfortunately, in aubinator and aubinator_error_decode we don't always know how many of a given state we have, so we must guess. One day, we'll come up with a way to annotate the batch to solve this problem. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:20 -08:00
Jason Ekstrand	4ac2ee9001	intel/batch-decoder: Decode constants, binding tables, and samplers Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:18 -08:00
Jason Ekstrand	d374423eab	intel/tools: Switch aubinator_error_decode over to the gen_print_batch The shared framework can now do everything that aubinator_error_decode ever did and more. It's time to make the switch. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:16 -08:00
Jason Ekstrand	c86671c438	intel/batch-decoder: Decode graphics shaders Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:15 -08:00
Jason Ekstrand	d4081fb778	intel/batch-decoder: Decode vertex and index buffers Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:13 -08:00
Jason Ekstrand	e27ec208ed	intel/batch-decoder: Decode MEDIA_INTERFACE_DESCRIPTOR_LOAD Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:12 -08:00
Jason Ekstrand	be20043d00	intel/tools: Add the start of a generic batch decoder Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:10 -08:00
Jason Ekstrand	4cb96fbd91	intel/decoder: Expose the raw field value in the iterator Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:09 -08:00
Jason Ekstrand	79269e8f4b	intel/disasm: Take a devinfo in gen_disasm_create Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:06 -08:00
Jason Ekstrand	a7ae72032f	intel/decoder: Take a bit offset in gen_print_group Previously, if a group was nested in another group such that it didn't start on a dword boundary, we would decode it as if it started at the start of its first dword. This changes things to work even more in terms of bits so that we can properly decode these structs. This affects MOCS, attribute swizzles, and several other things. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:04 -08:00
Jason Ekstrand	dca8f466ee	intel/decoder: Stop rounding down to the nearest dword Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:03 -08:00
Jason Ekstrand	f264640693	intel/decoder: Convert the iterator to work entirely in bits Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:01 -08:00
Jason Ekstrand	ada705b671	intel/decoder: Drop gen_field_decode helper It's unused Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:26:44 -08:00
Francisco Jerez	acab52f520	intel/fs/bank_conflicts: Don't touch Gen7 MRF hack registers. Fixes: `af2c320190` "intel/fs: Implement GRF bank conflict mitigation pass." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104199 Reported-by: Darius Spitznagel <d.spitznagel@goodbytez.de> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-12-12 12:05:45 -08:00
Samuel Iglesias Gonsálvez	ba4bb0838b	anv: fix bug when using component qualifier in FS outputs We can write to the same output but in different components, like in this example: layout(location = 0, component = 0) out ivec2 dEQP_FragColor_0; layout(location = 0, component = 2) out ivec2 dEQP_FragColor_1; Therefore, they are not two different outputs but only one. Fixes: dEQP-VK.glsl.440.linkage.varying.component.frag_out.* v3: - Remove FRAG_RESULT_MAX. - Add const and use sizeof (Ian). - Do three-pass to set properly the locations of fragment outputs when having arrays (Jason). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-12 07:24:55 +01:00
Jason Ekstrand	4c7af87fb9	anv: Enable UBO pushing Push constants on Intel hardware are significantly more performant than pull constants. Since most Vulkan applications don't actively use push constants on Vulkan or at least don't use it heavily, we're pulling way more than we should be. By enabling pushing chunks of UBOs we can get rid of a lot of those pulls. On my SKL GT4e, this improves the performance of Dota 2 and Talos by around 2.5% and improves Aztec Ruins by around 2%. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-12-08 15:43:26 -08:00
Jason Ekstrand	f1ce0b905a	i965/fs: Handle !supports_pull_constants and push UBOs properly In Vulkan, we don't support classic pull constants and everything the client asks us to push, we push. However, for pushed UBOs, we still want to fall back to conventional pulls if we run out of space.	2017-12-08 15:43:25 -08:00
Jason Ekstrand	8d34077182	anv/device: Increase the UBO alignment requirement to 32 Push constants work in terms of 32-byte chunks so if we want to be able to push UBOs, every thing needs to be 32-byte aligned. Currently, we only require 16-byte which is too small. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-12-08 15:43:25 -08:00
Jason Ekstrand	2f9eb045f3	anv/cmd_buffer: Add support for pushing UBO ranges In order to do this we have to modify push constant set up to handle ranges. We also have to tweak the way we handle dirty bits a bit so that we re-push whenever a descriptor set changes. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-12-08 15:43:25 -08:00
Jason Ekstrand	0c879b62b0	anv/cmd_buffer: Add some stage asserts There are several places where we look up opcodes in an array of stages. Assert that the we don't end up going out-of-bounds. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-12-08 15:43:25 -08:00
Jason Ekstrand	1968cd07a2	anv/cmd_buffer: Add some helpers for working with descriptor sets Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-12-08 15:43:25 -08:00

1 2 3 4 5 ...

2557 Commits