AlexIndustrial/mesa

Author	SHA1	Message	Date
Dave Airlie	2774d39366	spirv: fix SpvOpBitSize return value. The spir-v spec says this returns a bool. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-03 15:22:57 +10:00
Kenneth Graunke	5ff5d0a895	iris: Disable dual source blending when shader doesn't handle it This is a port of Danylo's `eca4a6548d` which fixed the hang on i965. It fixes GPU hangs in his new Piglit test, arb_blend_func_extended-dual-src-blending-discard-without-src1. I avoided my own review feedback here, and decided to simply adjust 3DSTATE_PS_BLEND rather than BLEND_STATE_ENTRY[0]. It has never been clear to me which the hardware uses in every case. However, whacking the enable in 3DSTATE_PS_BLEND seems to be sufficient to fix the hang, and that packet is already dynamic, so it's easy to handle. I'd rather avoid making BLEND_STATE_ENTRY[0] dynamic unless I have to.	2019-05-02 21:14:49 -07:00
Jason Ekstrand	be7e9870d6	anv: Stop including POS in FS input limits It is an input but it comes in as part of the shader payload and doesn't count towards the limits. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-02 18:56:51 -05:00
Rob Clark	b73dd91f60	nir: fix nir tex print harder Fixes: `691d5a825a` nir: rework tex instruction printing Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-02 15:06:01 -07:00
Erico Nunes	568e8fc736	lima/ppir: support nir_op_ftrunc Support nir_op_ftrunc by turning it into a mov with a round to integer output modifier. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-05-02 20:55:56 +00:00
Heinrich	9b80322532	gbm: Improve documentation of BO import - Add GBM_BO_IMPORT_FD_MODIFIER to documentation of supported foreign object types - Add newline before documentation block - Improve language Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-05-02 20:36:38 +00:00
Samuel Pitoiset	62001f3dff	radv: only need to force emit the TCS regs on Vega10 and Raven1 Other GFX9 chips aren't affected. Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-02 22:29:01 +02:00
Marek Olšák	b3a26d4628	glsl: fix and clean up NV_compute_shader_derivatives support - make sure compute shader derivatives are exposed for all extensions - unify duplicated code Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-02 16:09:24 -04:00
Marek Olšák	20909284f2	st/dri: decrease input lag by syncing sooner in SwapBuffers It's done by: - decrease the number of frames in flight by 1 - flush before throttling in SwapBuffers (instead of wait-then-flush, do flush-then-wait) The improvement is apparent with Unigine Heaven. Previously: draw frame 2 wait frame 0 flush frame 2 present frame 2 The input lag is 2 frames. Now: draw frame 2 flush frame 2 wait frame 1 present frame 2 The input lag is 1 frame. Flushing is done before waiting, because otherwise the device would be idle after waiting. Nine is affected because it also uses the pipe cap.	2019-05-02 16:09:24 -04:00
Erik Faye-Lund	28f18915b8	meson: lift driver-collection out into parent build-file This way we can mark the dri_drivers and dri_link arrays as temporary, as all knowledge about them are contained in a single build-file with clearly visible limited life-span. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-02 18:30:29 +00:00
Rob Clark	8c77e669a8	freedreno/a6xx: smaller hammer for fb barrier We just need to do a sequence of commands to flush the cache. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	6fa8a6d60f	freedreno/a6xx: KHR_blend_equation_advanced support Wire up support to sample from the fb (and force GMEM rendering when we have fb reads). The existing GLSL IR lowering for blend_equation_advanced does the rest. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	650246523b	freedreno/ir3: fb read support Lower load_output to txf_ms_fb and add support for the new texture fetch instruction. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	0704ddb2e5	freedreno/drm: expose GMEM_BASE address Needed for sampling from tile buffer (GMEM). Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	a99c360a46	nir: add pass to lower fb reads Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	a2c89a85f4	nir: fix lower_wpos_ytransform in load_frag_coord case Apparently we never hit this path. Or at least haven't for a rather long time. But in either case (load_deref or load_frag_coord), we can just directly use the intrinsic's ssa dest. So stop passing the nir_variable (which would be NULL in the load_frag_coord case) around and instead just use &intr->dest.ssa. (This ofc means we need to setup the cursor to insert after the instruction, which seems to be another bug of the original implementation.) Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	691d5a825a	nir: rework tex instruction printing The extra comma at the end was annoying me. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	ca3eb5db66	freedreno/ir3: add some ubo range related asserts And a comment.. since we are mixing units of bytes/dwords/vec4, hopefully this will avoid some unit confusion. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-02 11:19:22 -07:00
Rob Clark	e941faf3e8	freedreno/ir3: add IR3_SHADER_DEBUG flag to disable ubo lowering It isn't quite as simple as not running the pass, since with packed varyings we get load_ubo for block==0 (ie. the "real" uniforms). So instead run the pass normally but decline to lower anything in block > 0 Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-02 11:19:22 -07:00
Rob Clark	f697f61590	freedreno/ir3: fix lowered ubo region alignment Since we emit UBO regions INDIRECTly (ie. not copied into cmdstream but emit by EXT_SRC_ADDR) we need to keep them 4*vec4 aligned. Which the code already mostly did, except for aligning the first UBO region itself (ie. the one after block==0 which is the "real" uniforms). Fixes: `893425a607` freedreno/ir3: Push UBOs to constant file Fixes: `3c8779af32` freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-02 11:19:22 -07:00
Rob Clark	32925f4072	freedreno/ir3: fix shader variants vs UBO analysis Otherwise we zero out the state again, but all the UBO loads that we could lower are already lowered. End result is that we didn't emit the uniforms for lowered UBO access in any case where multiple shader variants are used. Fixes: `893425a607` freedreno/ir3: Push UBOs to constant file Fixes: `3c8779af32` freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-02 11:19:22 -07:00
Lionel Landwerlin	ff4168c418	vulkan/overlay: add TODO list Keen on having other people contribute. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 17:02:57 +01:00
Lionel Landwerlin	99cb2d325f	vulkan/overlay: make overriden functions static And fix the unused CmdDrawIndirect. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:57 +01:00
Lionel Landwerlin	f2afd6bd76	vulkan/overlay: make overlay size configurable Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:55 +01:00
Lionel Landwerlin	7d908038ad	vulkan/overlay: add a frame counter option This is useful to normalize the numbers written into the output file as those number are accumulated over a period of time and number of frames. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:35 +01:00
Lionel Landwerlin	81fd6ba7cc	vulkan/overlay: record all select metrics into output file The output looks something like this (csv style) : fps, frame, frame_timing(us), submit, draw_indexed, pipeline_graphics, acquire_timing(us), vert_invocations, frag_invocations, gpu_timing(ns) 480.55, 242, 501512, 247, 1444, 1204, 714, 5827272, 113043296, 121424174 467.80, 234, 500214, 234, 1412, 1176, 648, 5635680, 109436188, 117743760 424.37, 213, 501923, 213, 2130, 1704, 623, 5132448, 99657292, 105474683 472.15, 237, 501962, 237, 2370, 1896, 667, 5710752, 110924644, 122226004 411.32, 206, 500826, 206, 2060, 1648, 709, 4963776, 96491764, 95333273 458.87, 230, 501228, 230, 2300, 1840, 634, 5542080, 107758204, 123112090 475.01, 238, 501044, 238, 2380, 1904, 631, 5734848, 111477480, 122087426 471.08, 236, 500972, 236, 2360, 1888, 655, 5686656, 110498496, 114816162 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:34 +01:00
Lionel Landwerlin	74a9fdd8a2	vulkan/overlay: add a margin to the size of the window Looks a bit better. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:07 +01:00
Lionel Landwerlin	7ba50d8040	vulkan/overlay: add no display option In case you're just interested in data being record to the output file. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:07 +01:00
Lionel Landwerlin	ea7a6fa980	vulkan/overlay: add pipeline statistic & timestamps support v2: switch to VkBase{In,Out}Structure v3: Add timestamps at begin/end of primary command buffers to estimate gpu time spent per submission (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v2)	2019-05-02 17:02:06 +01:00
Lionel Landwerlin	4438188f49	vulkan/overlay: record stats in command buffers and accumulate on exec/submit This significantly reworks how numbers displayed are computed. We accumulate operations written into command buffers and add those to the device when submitted to a queue. These collected values are then used to compute per frame overlay data. We also accumulate the data over the sampling fps period to produce numbers for that period of time. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:06 +01:00
Lionel Landwerlin	9eddceef44	vulkan/overlay: update help printout Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 17:02:06 +01:00
Lionel Landwerlin	a1e6b5e9be	vulkan/util: generate a helper function to return pNext struct sizes This will be used to copy chains of structures so that we can alterate some of them. v2: Drop vk_util.h include (Eric) Use VkBaseInStructure directly (Eric) v3: Drop --platforms= param to generator script, instead produce a file with #ifdef based what platforms are compiled. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 17:02:02 +01:00
Tomeu Vizoso	ad7c9ba0ec	panfrost/midgard: Skip liveness analysis for instructions without dest [Alyssa: Add comment explanation] Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-02 15:29:48 +00:00
Tomeu Vizoso	a5dddc2d42	panfrost/midgard: Skip register allocation if there's no work to do Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-02 15:29:41 +00:00
Eric Engestrom	a34ee4dec7	egl: hard-code destroy function instead of passing it around as a pointer Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-05-02 14:44:16 +00:00
Connor Abbott	6ec4ed48fc	nir/search: Add debugging code to dump the pattern matched This was useful while debugging the previous commit. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-02 16:14:06 +02:00
Connor Abbott	7ce86e6938	nir/search: Add automaton-based pre-searching nir_opt_algebraic is currently one of the most expensive NIR passes, because of the many different patterns we've added over the years. Even though patterns are already sorted by opcode, there are still way too many patterns for common opcodes like bcsel and fadd, which means that many patterns are tried but only a few actually match. One way to fix this is to add a pre-pass over the code that scans it using an automaton constructed beforehand, similar to the automatons produced by lex and yacc for parsing source code. This automaton has to walk the SSA graph and recognize possible pattern matches. It turns out that the theory to do this is quite mature already, having been developed for instruction selection as well as other non-compiler things. I followed the presentation in the dissertation cited in the code, "Tree algorithms: Two Taxonomies and a Toolkit," trying to keep the naming similar. To create the automaton, we have to perform something like the classical NFA to DFA subset construction used by lex, but it turns out that actually computing the transition table for all possible states would be way too expensive, with the dissertation reporting times of almost half an hour for an example of size similar to nir_opt_algebraic. Instead, we adopt one of the "filter" approaches explained in the dissertation, which trade much faster table generation and table size for a few more table lookups per instruction at runtime. I chose the filter which resulted the fastest table generation time, with medium table size. Right now, the table generation takes around .5 seconds, despite being implemented in pure Python, which I think is good enough. Based on the numbers in the dissertation, the other choice might make table compilation time 25x slower to get 4x smaller table size, but I don't think that's worth it. As of now, we get the following binary size before and after this patch: text data bss dec hex filename 11979455 464720 730864 13175039 c908ff before i965_dri.so text data bss dec hex filename 12037835 616244 791792 13445871 cd2aef after i965_dri.so There are a number of places where I've simplified the automaton by getting rid of details in the LHS patterns rather than complicate things to deal with them. For example, right now the automaton doesn't distinguish between constants with different values. This means that it isn't as precise as it could be, but the decrease in compile time is still worth it -- these are the compilation time numbers for a shader-db run with my (admittedly old) database on Intel skylake: Difference at 95.0% confidence -42.3485 +/- 1.375 -7.20383% +/- 0.229926% (Student's t, pooled s = 1.69843) We can always experiment with making it more precise later. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-02 16:14:06 +02:00
Samuel Pitoiset	08be23bfde	radv: set WD_SWITCH_ON_EOP=1 when drawing primitives from a stream output buffer According to RadeonSI, this seems to be required by the hardware to avoid GPU hangs. I think I just forgot to set that bit when I implemented VK_EXT_transform_feedback. This fixes a GPU hang with Space Engineers and DXVK. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110291 Fixes: `b4eb029062` ("radv: implement VK_EXT_transform_feedback") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-02 15:55:46 +02:00
Brian Paul	48107b5a2b	glsl: fix typo in #warning message Trivial. Spotted by Eric Engestrom.	2019-05-02 06:32:57 -06:00
Brian Paul	f0f7c3b03a	svga: add SVGA_NO_LOGGING env var (v2) valgrind crashes when we try to initialize host logging. This env var can be used to disable logging. v2: rebase onto "svga: move host logging to winsys". Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Neha Bhende <bhenden@vmware.com>	2019-05-02 06:09:35 -06:00
Charmaine Lee	9c5f407b0b	svga: move host logging to winsys This patch adds a host_log interface to svga_winsys and moves the host logging code to the winsys layer. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Neha Bhende <bhenden@vmware.com>	2019-05-02 06:09:35 -06:00
Eric Engestrom	da8d9e2d88	wsi/wayland: document lack of vkAcquireNextImageKHR timeout support Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:51:03 +00:00
Daniel Stone	9826e04eca	vulkan/wsi/wayland: Respect non-blocking AcquireNextImage If the client has requested that AcquireNextImage not block at all, with a timeout of 0, then don't make any non-blocking calls. This will still potentially block infinitely given a non-infinte timeout, but the fix for that is much more involved. Signed-off-by: Daniel Stone <daniels@collabora.com> Cc: mesa-stable@lists.freedesktop.org Cc: Chad Versace <chadversary@chromium.org> Cc: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108540 Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Chad Versace <chadversary@chromium.org> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:51:03 +00:00
Rhys Perry	13c423629e	radv: fix set_output_usage_mask() with composite and 64-bit types It previously used var->type instead of deref_instr->type and didn't handle 64-bit outputs. This fixes lots of transform feedback CTS tests involving transform feedback and geometry shaders (mostly dEQP-VK.transform_feedback.fuzz.random_geometry.*) v2: fix writemask widening when comp != 0 v3: fix 64-bit variables when comp != 0, again Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-05-02 10:24:20 +01:00
Thomas Hellstrom	20b7839392	winsys/svga: Don't abort on EBUSY errors from execbuffer This error code typically indicated that a buffer object that was referenced by the command stream was being used for CPU access by another client. The correct action here is to retry after a while. Use usleep() until we have proper kernel support for this wait. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2019-05-02 09:51:15 +02:00
Thomas Hellstrom	c69557c4a2	winsys/svga: Update the drm interface file The file vmwgfx_drm.h was a bit outdated. Update to a recent version, including defines supporting coherent memory. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2019-05-02 09:51:07 +02:00
Thomas Hellstrom	978d66e4d5	svga: Avoid bouncing buffer data in malloced buffers Some constant- and texture upload buffer data may bounce in malloced buffers before being transferred to hardware buffers. In the case of texture upload buffers this seems to be an oversight. In the case of constant buffers, code comments indicate that we want to avoid mapping hardware buffers for reading when copying out of buffers that need modification before being passed to hardware. In this case we avoid data bouncing for upload manager buffers but make sure buffers that we read out from stay in malloced memory. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2019-05-02 09:51:00 +02:00
Thomas Hellstrom	5961189f4e	winsys/svga: Enable the transfer_from_buffer GPU command for vgpu10 We didn't have the path using this command enabled as typically we take an alternate path using DMA uploads. Emable it so that we can exercise that code-path by turning off the DMA path. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2019-05-02 09:50:52 +02:00
Thomas Hellstrom	50e58966fa	winsys/svga: Add an environment variable to force host-backed operation The vmwgfx kernel module has a compatibility mode for user-space that is not guest-backed resource aware. Add an environment variable to facilitate testing of this mode on guest-backed aware kernels: if the environment variable SVGA_FORCE_HOST_BACKED is defined, the driver will use host-backed operation. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Deepak Rawat <drawat@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2019-05-02 09:50:22 +02:00
Samuel Pitoiset	492e828848	ac: tidy up ac_build_llvm8_tbuffer_{load,store} For consistency with ac_build_llvm8_buffer_{load,store}_common helpers and that will help a bit for removing the vec3 restriction. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-02 09:24:05 +02:00

1 2 3 4 5 ...

102083 Commits