AlexIndustrial/mesa

Author	SHA1	Message	Date
Brian Paul	99effaa965	svga: try to avoid index generation for some primitive types The svga device doesn't directly support quads, quad strips or polygons so we have to convert those types to indexed triangle lists. But we can sometimes avoid that if we're drawing flat/constant-colored prims and we don't have to worry about provoking vertex. Reviewed-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2015-10-22 17:19:20 -06:00
Brian Paul	129d34da49	svga: avoid provoking vertex conversion when possible Provoking vertex comes into play when doing flat shading. But if we know that all fragments in a primitive are the same color, the provoking vertex doesn't matter. Check for that case and use whichever provoking vertex convention is supported by the device. This avoids generating an index buffer to do the PV conversion. Reviewed-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2015-10-22 17:19:20 -06:00
Brian Paul	1082735bb6	svga: detect constant color writes in fragment shaders Examine the fragment shader to try to detect TGSI shaders which use "MOV OUT[0], CONST[i]" to write a constant value for the fragment color. In this case, all fragments will have the same color (unless blending is enabled). This is a common case for OpenGL code such as: glColor(), glBegin(), glVertex(), ..., glEnd() when lighting/fog/etc are disabled. In this case, the Mesa/gallium state tracker actually generates a simple "MOV OUT[0], CONST[i]" fragment shader. This will be used by the next commit to avoid provoking vertex conversion (creating/rewriting an index buffer) when drawing flat-shaded primitives. Reviewed-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2015-10-22 17:19:20 -06:00
Brian Paul	df0f817e31	mesa: check for unchanged line width before error checking Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-10-22 17:19:20 -06:00
Brian Paul	990afdc045	st/mesa: use _mesa_RasterPos() when possible The st_RasterPos() function goes to great pains to implement the rasterpos transformation. It basically uses gallium's draw module to execute the vertex shader to draw a point, then capture that point's attributes. But glRasterPos isn't typically used with a vertex shader so we can usually use the old/fixed-function implementation which is a lot simpler and faster. This can add up for legacy apps that make a lot of calls to glRasterPos. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2015-10-22 17:19:20 -06:00
Brian Paul	af0399a1ce	tnl: remove t_rasterpos.c Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2015-10-22 17:19:20 -06:00
Brian Paul	234d5320bb	drivers/common: use _mesa_RasterPos instead of _tnl_RasterPos Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2015-10-22 17:19:20 -06:00
Brian Paul	614a743767	mesa: copy rasterpos evaluation code into core Mesa We'll remove it from the tnl module next. By lifting this code into core Mesa we can use it from the gallium state tracker. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2015-10-22 17:19:20 -06:00
Brian Paul	9919f56099	vbo: optimize vertex copying when 'wrapping' Instead of calling memcpy() 'n' times, we can do it all at once since the source and dest regions are all contiguous. Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-10-22 17:19:20 -06:00
Alex Deucher	7b63658125	radeon/uvd: don't expose HEVC on old UVD hw (v3) The section for UVD 2 and older was not updated when HEVC support was added. Reported by Kano on irc. v2: integrate the UVD2 and older checks into the main switch statement. v3: handle encode checking as well. Encode is already checked in the top case statement, so drop encode checks in the lower case statement. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: mesa-stable@lists.freedesktop.org	2015-10-22 16:22:44 -04:00
Alejandro Piñeiro	8cf84a7e47	i965/vec4: print predicate control at brw_vec4 dump_instruction v2: externalize pred_ctrl_align16 from brw_disasm.c instead of adding a copy on brw_vec4.c, as suggested by Matt Turner Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-10-22 21:58:03 +02:00
Alejandro Piñeiro	92ae101ed0	i965/vec4: use an envvar to decide to print the assembly on cmod_propagation tests The complete way to do this would be parse INTEL_DEBUG and print the output if DEBUG_VS (or a new one) is present (see intel_debug.c). But that seems like an overkill for the unit tests, that after all, the most common use case is being run when calling make check. v2: use the same idea for the fs counterpart too, as suggested by Matt Turner Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-10-22 21:58:03 +02:00
Alejandro Piñeiro	8fc8fcc04f	i965/vec4: Add unit tests for cmod propagation pass This include the same tests coming from test_fs_cmod_propagation, (non vector glsl types included) plus some new with vec4 types, inspired on the regressions found while the optimization was a work in progress. Additionally, the check of number of instructions after the optimization was changed from EXPECT_EQ to ASSERT_EQ. This was done to avoid a crash on failing tests that expected no optimization, as after checking the number of instructions, there were some checks related to this last instruction opcode/conditional mod. v2: update tests after Matt Turner's review of the optimization pass v3: tweaks on the tests (mostly on the comments), after Matt Turner's review Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-10-22 21:58:03 +02:00
Alejandro Piñeiro	627f94b72e	i965/vec4: adding vec4_cmod_propagation optimization vec4 port of fs_cmod_propagation. Shader-db results (no vec4 grepping): total instructions in shared programs: 6240413 -> 6235841 (-0.07%) instructions in affected programs: 401933 -> 397361 (-1.14%) total loops in shared programs: 1979 -> 1979 (0.00%) helped: 2265 HURT: 0 v2: remove extra space and combine two if blocks, as suggested by Matt Turner v3: add condition check to bail out if current inst and inst being scanned has different writemask, as pointed by Matt Turner v3: updated shader-db numbers v4: remove block from foreach_inst_in_block_*_starting_from after commit `801f151917` Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-10-22 21:58:03 +02:00
Alejandro Piñeiro	a59359ecd2	i965/vec4: track and use independently each flag channel vec4_live_variables tracks now each flag channel independently, so vec4_dead_code_eliminate can update the writemask of null registers, based on which component are alive at the moment. This would allow vec4_cmod_propagation to optimize out several movs involving null registers. v2: added support to track each flag channel independently at vec4 live_variables, as v1 assumed that it was already doing it, as pointed by Francisco Jerez v3: general cleaningn after Matt Turner's review Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-10-22 21:58:03 +02:00
Alejandro Piñeiro	8ac3b525c7	i965/vec4: nir_emit_if doesn't need to predicate based on all the channels v2: changed comment, as suggested by Matt Turner Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-10-22 21:58:03 +02:00
Matt Turner	1095d837dc	i965/vec4/gs: Fix signed/unsigned comparison warning.	2015-10-22 12:27:04 -07:00
Matt Turner	e2707c8765	i965/fs: Emit a single ADD instruction for SET_SAMPLE_ID on Gen8+. Gen8+ lifted the register region restriction that an instruction whose destination spans two registers must have sources that also span two registers. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2015-10-22 12:27:00 -07:00
Matt Turner	0f74796e33	i965/fs: Drop unnecessary write-enable-all from SET_SAMPLE_ID. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2015-10-22 12:26:57 -07:00
Matt Turner	e2344e11ce	i965/fs: Trim unneeded channels in SampleID setup. The AND and SHR produce a scalar value that we had been replicating across $dispatch_width channels. The immediate MOV produces only four useful channels of data. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2015-10-22 12:26:54 -07:00
Matt Turner	e10fc055e7	i965/fs: Use type-W for immediate in SampleID setup. Not a functional difference, but register is loaded with a signed immediate (V) and added to a signed type (D) producing a signed result (D). Also change the type of g0 to allow for compaction. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2015-10-22 12:26:49 -07:00
Matt Turner	cfb67c3d06	i965/vec4: Initialize LOD to 0.0f for textureQueryLevels() and texture(). We implement textureQueryLevels (which takes no arguments, save the sampler) using the resinfo message (which takes an argument of LOD). Without initializing it, we'd generate a MOV from the null register to load the LOD argument. Essentially the same logic applies to texture. A vertex shader cannot compute derivatives and so cannot produce an LOD, so TXL with an LOD of 0.0 is used. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2015-10-22 10:16:52 -07:00
Matt Turner	65ffaf2740	i965: Note that the UV immediate type is Gen6+.	2015-10-22 10:16:52 -07:00
Jose Fonseca	718249843b	gallivm: Translate all util_cpu_caps bits to LLVM attributes. This should prevent disparity between features Mesa and LLVM believe are supported by the CPU. http://lists.freedesktop.org/archives/mesa-dev/2015-October/thread.html#96990 Tested on a i7-3720QM w/ LLVM 3.3 and 3.6. v2: Increase SmallVector initial size as suggested by Gustaw Smolarczyk. Reviewed-by: Roland Scheidegger <sroland@vmware.com> CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>	2015-10-22 11:11:40 +01:00
Jordan Justen	627c15cde4	i965/fs: Disable CSE optimization for untyped & typed surface reads An untyped surface read is volatile because it might be affected by a write. In the ES31-CTS.compute_shader.resources-max test, two back to back read/modify/writes of an SSBO variable looked something like this: r1 = untyped_surface_read(ssbo_float) r2 = r1 + 1 untyped_surface_write(ssbo_float, r2) r3 = untyped_surface_read(ssbo_float) r4 = r3 + 1 untyped_surface_write(ssbo_float, r4) And after CSE, we had: r1 = untyped_surface_read(ssbo_float) r2 = r1 + 1 untyped_surface_write(ssbo_float, r2) r4 = r1 + 1 untyped_surface_write(ssbo_float, r4) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2015-10-22 00:36:37 -07:00
Chia-I Wu	13a5805b64	ilo: make sure there is HiZ before resolving We do not want to perform a depth resolve on an MCS enabled surface.	2015-10-22 14:06:21 +08:00
Chia-I Wu	0b6f6ee50f	ilo: fix max thread count for HS on Gen8 It is in DW2 on Gen8.	2015-10-22 14:06:21 +08:00
Ben Widawsky	8eefdacb38	i965: Advertise ARB_shader_stencil_export (gen9+) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-10-21 21:14:44 -07:00
Ben Widawsky	1db44252d0	i965: Implement ARB_shader_stencil_export (gen9+) v2: remove useless source_stencil_to_render_target (Ken) Squash in the actual packing function, which also got to v2: Move the definition of the OPCODE outside of FB_WRITE opcodes (Matt) Reorder the regioning to be in VWH order (Matt) Don't retype src in the backend, just assert instead (Matt) Rename the debug prints to something better (Matt) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-10-21 21:14:44 -07:00
Ben Widawsky	5fa7114652	i965/fs: Enumerate logical fb writes arguments Gen9 adds the ability to write out a stencil value, so we need to expand the virtual payload by one. Abstracting this now makes that change easier to read. I was admittedly confused early on about some of the hardcoding. If people believe the resulting code is inferior, I am not super attached to the patch. v2: Remove explicit numbering from the enumeration (Matt). Use a real naming scheme, and reference it in the opcode definition (Curro) Add a missed hardcoded logical position in get_lowered_simd_width (Ben) Add an assertion to make sure the component numbering is correct (Ben) Cc: Matt Turner <mattst88@gmail.com> Cc: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-10-21 21:14:44 -07:00
Brian Paul	18a631eb90	svga: fix clip plane regression after recent tgsi_scan change Before the change "tgsi/scan: use properties for clip/cull distance writemasks", the tgsi_shader_info::num_written_clipdistance field was a multiple of four, now it's an accurate count. In the svga driver, we need a minor change to the loop test. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2015-10-21 17:12:19 -06:00
Kenneth Graunke	48c76eae8e	i965: Implement gl_InvocationID. It's stored in bits 31:27 of g1 (along with the URB handles). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-10-21 14:27:58 -07:00
Kenneth Graunke	c5ae34f38f	i965: Implement nir_intrinsic_load_primitive. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-10-21 14:27:56 -07:00
Kenneth Graunke	b3ebf03b84	i965: Add a fs_visitor constructor that takes a brw_gs_compile. Unlike the vs/wm structs, brw_gs_compile is actually useful: it contains the input VUE map and information about the control data headers. Passing this in allows us to share that code in brw_gs.c, and calculate them before deciding on vec4 vs. scalar mode, as it's independent of that choice. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-10-21 14:27:54 -07:00
Kenneth Graunke	55dfd39b5f	i965: Add a brw->scalar_gs flag controlled by INTEL_SCALAR_GS=1. This patch introduces a brw->scalar_gs flag, similar to brw->scalar_vs, which controls whether or not to use SIMD8 geometry shaders. For now, we control it via a new environment variable, INTEL_SCALAR_GS. This provides a convenient way to try it out. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-10-21 14:27:53 -07:00
Kenneth Graunke	ac0a33666b	i965: Make emit_urb_writes() reserve space for GS header information. Geometry shaders have additional header data at the beginning of their output URB entries. Shaders that use EndPrimitive() or multiple streams have a control data header; shaders with a dynamic vertex count have an additional vec4 slot to hold the 32-bit vertex count (and 96 bits of padding). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-10-21 14:27:52 -07:00
Kenneth Graunke	cb755996d9	i965: Make emit_urb_writes() only set EOT for the VS. The GS will emit a bunch of vertices, and we don't want to do an EOT prematurely. We'll emit GS_OPCODE_THREAD_END when we want to terminate the thread. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-10-21 14:27:50 -07:00
Kenneth Graunke	6ae419b94d	i965: Make fs_visitor::emit_urb_writes reusable for scalar GS. GS doesn't have ClampVertexColor, and we don't want to go through VS structures. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-10-21 14:27:49 -07:00
Kenneth Graunke	72d84ae7ce	i965: Introduce a brw_vue_prog_data::include_vue_handles flag. Tessellation shaders and SIMD8 geometry shaders may need to resort to the pull model for inputs at times. When set, the state upload code will tell the hardware to provide URB handles for input data. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-10-21 14:27:48 -07:00
Kenneth Graunke	ac98888afd	i965: Introduce a new SHADER_OPCODE_URB_READ_SIMD8 opcode. In scalar mode, geometry shader inputs can easily take up hundreds of registers. This makes pushing VUE entries impractical; we'll need to resort to the pull model in some cases. To support this, we introduce a new opcode corresponding to the "URB Read SIMD8" message. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-10-21 14:27:46 -07:00
Kenneth Graunke	bea7522782	i965: Introduce new SHADER_OPCODE_URB_WRITE_SIMD8_MASKED/PER_SLOT opcodes. In the vec4 backend, we have a vec4_instruction::urb_write_flags field. There are many kinds of flags for SIMD4x2 messages. However, there are really only two (per-slot offset, use channel masks) for SIMD8 messages. Rather than adding a boolean flag for per-slot offsets (polluting all instructions), I decided to just make three new opcodes. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-10-21 14:27:41 -07:00
Jason Ekstrand	0e57694745	i965/gs: Do prog_data setup and other calculations in brw_compile_gs This commit moves the large pile of setup calculations we have to do for geometry shaders out of brw_gs_emit and into brw_compile_gs. This has a couple of nice implications. First, it's less work that the caller of brw_compile_gs has to do. Second, it's consistent with the vertex and fragment stages. Finally, it allows us to put brw_gs_compile back behind the API boundary where it belongs. v2 (Jason Ekstrand): - Pull the changes to use nir info into a separate patch - Put brw_gs_compile into brw_shader.h rather than brw_vec4_gs_visitor.h so that we can use it for scalar GS. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-10-21 14:20:32 -07:00
Jason Ekstrand	f3bc73073a	i965/gs: Use NIR info for setting up prog_data Previously, we were pulling bits from GL data structures in order to set up the prog_data. However, in this brave new world of NIR, we want to be pulling it out of the NIR shader whenever possible. This way, we can move all this setup code into brw_compile_gs without depending on the old GL stuff. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-10-21 14:20:32 -07:00
Jason Ekstrand	fac9b21e03	i965/gs: Pull prog_data out of brw_gs_compile Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-10-21 14:20:32 -07:00
Jason Ekstrand	6ac2bbec16	i965/gs: Use NIR instead of the brw_geometry_program for GS metadata With this, we can remove the geometry program from brw_gs_compile. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-10-21 14:20:32 -07:00
Jason Ekstrand	72148de217	i965/gs: Move the mem_ctx argument to brw_compile_gs This makes it better match the other brw_compile_* functions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-10-21 14:20:32 -07:00
Jason Ekstrand	8e8b527b27	i965/gs: Set static_vertex_count unconditionally on GEN8+ We always have NIR, so there's no reason for the check. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-10-21 14:20:32 -07:00
Jason Ekstrand	2686477d37	nir: Constify nir_gs_count_vertices Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-10-21 14:20:32 -07:00
Jason Ekstrand	4eb84a03be	nir/info: Add more information about geometry shaders Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-10-21 14:20:32 -07:00
Ben Widawsky	3c5d24363a	i965: (trivial) rename computes stencil to gen9 All the documentation I can find says that this bit (and functionality) only exists on SKL+. Since the bit isn't yet used, there is no real impact here. The original code was added by Ken here (a surprisingly long time ago): commit `f3c6d6f1e1` Author: Kenneth Graunke <kenneth@whitecape.org> Date: Thu Nov 29 21:00:27 2012 -0800 i965: Update 3DSTATE_PS, 3DSTATE_WM, and add 3DSTATE_PS_EXTRA. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-10-21 11:00:03 -07:00

1 2 3 4 5 ...

73773 Commits