AlexIndustrial/mesa

Author	SHA1	Message	Date
Brian Paul	9919f56099	vbo: optimize vertex copying when 'wrapping' Instead of calling memcpy() 'n' times, we can do it all at once since the source and dest regions are all contiguous. Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-10-22 17:19:20 -06:00
Alex Deucher	7b63658125	radeon/uvd: don't expose HEVC on old UVD hw (v3) The section for UVD 2 and older was not updated when HEVC support was added. Reported by Kano on irc. v2: integrate the UVD2 and older checks into the main switch statement. v3: handle encode checking as well. Encode is already checked in the top case statement, so drop encode checks in the lower case statement. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: mesa-stable@lists.freedesktop.org	2015-10-22 16:22:44 -04:00
Alejandro Piñeiro	8cf84a7e47	i965/vec4: print predicate control at brw_vec4 dump_instruction v2: externalize pred_ctrl_align16 from brw_disasm.c instead of adding a copy on brw_vec4.c, as suggested by Matt Turner Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-10-22 21:58:03 +02:00
Alejandro Piñeiro	92ae101ed0	i965/vec4: use an envvar to decide to print the assembly on cmod_propagation tests The complete way to do this would be parse INTEL_DEBUG and print the output if DEBUG_VS (or a new one) is present (see intel_debug.c). But that seems like an overkill for the unit tests, that after all, the most common use case is being run when calling make check. v2: use the same idea for the fs counterpart too, as suggested by Matt Turner Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-10-22 21:58:03 +02:00
Alejandro Piñeiro	8fc8fcc04f	i965/vec4: Add unit tests for cmod propagation pass This include the same tests coming from test_fs_cmod_propagation, (non vector glsl types included) plus some new with vec4 types, inspired on the regressions found while the optimization was a work in progress. Additionally, the check of number of instructions after the optimization was changed from EXPECT_EQ to ASSERT_EQ. This was done to avoid a crash on failing tests that expected no optimization, as after checking the number of instructions, there were some checks related to this last instruction opcode/conditional mod. v2: update tests after Matt Turner's review of the optimization pass v3: tweaks on the tests (mostly on the comments), after Matt Turner's review Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-10-22 21:58:03 +02:00
Alejandro Piñeiro	627f94b72e	i965/vec4: adding vec4_cmod_propagation optimization vec4 port of fs_cmod_propagation. Shader-db results (no vec4 grepping): total instructions in shared programs: 6240413 -> 6235841 (-0.07%) instructions in affected programs: 401933 -> 397361 (-1.14%) total loops in shared programs: 1979 -> 1979 (0.00%) helped: 2265 HURT: 0 v2: remove extra space and combine two if blocks, as suggested by Matt Turner v3: add condition check to bail out if current inst and inst being scanned has different writemask, as pointed by Matt Turner v3: updated shader-db numbers v4: remove block from foreach_inst_in_block_*_starting_from after commit `801f151917` Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-10-22 21:58:03 +02:00
Alejandro Piñeiro	a59359ecd2	i965/vec4: track and use independently each flag channel vec4_live_variables tracks now each flag channel independently, so vec4_dead_code_eliminate can update the writemask of null registers, based on which component are alive at the moment. This would allow vec4_cmod_propagation to optimize out several movs involving null registers. v2: added support to track each flag channel independently at vec4 live_variables, as v1 assumed that it was already doing it, as pointed by Francisco Jerez v3: general cleaningn after Matt Turner's review Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-10-22 21:58:03 +02:00
Alejandro Piñeiro	8ac3b525c7	i965/vec4: nir_emit_if doesn't need to predicate based on all the channels v2: changed comment, as suggested by Matt Turner Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-10-22 21:58:03 +02:00
Matt Turner	1095d837dc	i965/vec4/gs: Fix signed/unsigned comparison warning.	2015-10-22 12:27:04 -07:00
Matt Turner	e2707c8765	i965/fs: Emit a single ADD instruction for SET_SAMPLE_ID on Gen8+. Gen8+ lifted the register region restriction that an instruction whose destination spans two registers must have sources that also span two registers. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2015-10-22 12:27:00 -07:00
Matt Turner	0f74796e33	i965/fs: Drop unnecessary write-enable-all from SET_SAMPLE_ID. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2015-10-22 12:26:57 -07:00
Matt Turner	e2344e11ce	i965/fs: Trim unneeded channels in SampleID setup. The AND and SHR produce a scalar value that we had been replicating across $dispatch_width channels. The immediate MOV produces only four useful channels of data. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2015-10-22 12:26:54 -07:00
Matt Turner	e10fc055e7	i965/fs: Use type-W for immediate in SampleID setup. Not a functional difference, but register is loaded with a signed immediate (V) and added to a signed type (D) producing a signed result (D). Also change the type of g0 to allow for compaction. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2015-10-22 12:26:49 -07:00
Matt Turner	cfb67c3d06	i965/vec4: Initialize LOD to 0.0f for textureQueryLevels() and texture(). We implement textureQueryLevels (which takes no arguments, save the sampler) using the resinfo message (which takes an argument of LOD). Without initializing it, we'd generate a MOV from the null register to load the LOD argument. Essentially the same logic applies to texture. A vertex shader cannot compute derivatives and so cannot produce an LOD, so TXL with an LOD of 0.0 is used. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2015-10-22 10:16:52 -07:00
Matt Turner	65ffaf2740	i965: Note that the UV immediate type is Gen6+.	2015-10-22 10:16:52 -07:00
Jose Fonseca	718249843b	gallivm: Translate all util_cpu_caps bits to LLVM attributes. This should prevent disparity between features Mesa and LLVM believe are supported by the CPU. http://lists.freedesktop.org/archives/mesa-dev/2015-October/thread.html#96990 Tested on a i7-3720QM w/ LLVM 3.3 and 3.6. v2: Increase SmallVector initial size as suggested by Gustaw Smolarczyk. Reviewed-by: Roland Scheidegger <sroland@vmware.com> CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>	2015-10-22 11:11:40 +01:00
Jordan Justen	627c15cde4	i965/fs: Disable CSE optimization for untyped & typed surface reads An untyped surface read is volatile because it might be affected by a write. In the ES31-CTS.compute_shader.resources-max test, two back to back read/modify/writes of an SSBO variable looked something like this: r1 = untyped_surface_read(ssbo_float) r2 = r1 + 1 untyped_surface_write(ssbo_float, r2) r3 = untyped_surface_read(ssbo_float) r4 = r3 + 1 untyped_surface_write(ssbo_float, r4) And after CSE, we had: r1 = untyped_surface_read(ssbo_float) r2 = r1 + 1 untyped_surface_write(ssbo_float, r2) r4 = r1 + 1 untyped_surface_write(ssbo_float, r4) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2015-10-22 00:36:37 -07:00
Chia-I Wu	13a5805b64	ilo: make sure there is HiZ before resolving We do not want to perform a depth resolve on an MCS enabled surface.	2015-10-22 14:06:21 +08:00
Chia-I Wu	0b6f6ee50f	ilo: fix max thread count for HS on Gen8 It is in DW2 on Gen8.	2015-10-22 14:06:21 +08:00
Jason Ekstrand	82c579e314	anv/gen8: Set the correct maximum number of GS threads This equation was pulled from mesa gen8_gs_state.c	2015-10-21 21:51:18 -07:00
Jason Ekstrand	d0e8c78407	anv/pipeline: set the gs_vertex_count in compile_gs This was missed in the initial enabling commit.	2015-10-21 21:50:47 -07:00
Jason Ekstrand	8af2a09956	anv/pipeline: Make the has_push_constants computation more accurate The computation used to only look for uniforms that weren't samplers. Now it also filters out arrays of samplers.	2015-10-21 21:50:16 -07:00
Jason Ekstrand	0329a252bd	nir/spirv: Add defaults for GS input/output primitive types These are supposed to be specified in the SPIR-V source as SpvExecutionMode enums but glslang isn't giving them to us. A bug has been filed: https://github.com/KhronosGroup/glslang/issues/84	2015-10-21 21:46:22 -07:00
Ben Widawsky	8eefdacb38	i965: Advertise ARB_shader_stencil_export (gen9+) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-10-21 21:14:44 -07:00
Ben Widawsky	1db44252d0	i965: Implement ARB_shader_stencil_export (gen9+) v2: remove useless source_stencil_to_render_target (Ken) Squash in the actual packing function, which also got to v2: Move the definition of the OPCODE outside of FB_WRITE opcodes (Matt) Reorder the regioning to be in VWH order (Matt) Don't retype src in the backend, just assert instead (Matt) Rename the debug prints to something better (Matt) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-10-21 21:14:44 -07:00
Ben Widawsky	5fa7114652	i965/fs: Enumerate logical fb writes arguments Gen9 adds the ability to write out a stencil value, so we need to expand the virtual payload by one. Abstracting this now makes that change easier to read. I was admittedly confused early on about some of the hardcoding. If people believe the resulting code is inferior, I am not super attached to the patch. v2: Remove explicit numbering from the enumeration (Matt). Use a real naming scheme, and reference it in the opcode definition (Curro) Add a missed hardcoded logical position in get_lowered_simd_width (Ben) Add an assertion to make sure the component numbering is correct (Ben) Cc: Matt Turner <mattst88@gmail.com> Cc: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-10-21 21:14:44 -07:00
Jason Ekstrand	4032549885	i965/vec4: Handle returns at the end of functions	2015-10-21 20:42:23 -07:00
Jason Ekstrand	5f29dacda2	i965: Move get_hw_prim_for_gl_prim to brw_util.c	2015-10-21 20:40:28 -07:00
Jason Ekstrand	ea23cb3543	nir/spirv: Add capabilities and decorations for basic geometry shaders	2015-10-21 20:36:25 -07:00
Jason Ekstrand	d538fe849d	anv/pipeline: Add back basic geometry shader support Now that we've done the refactoring upstream, it's much easier to to get hooked up. We haven't tested things well enough to know that we're setting up the GPU state correctly for them yet but at least we can compile them now.	2015-10-21 18:45:48 -07:00
Jason Ekstrand	164abff0c0	nir/spirv: Add support for more CS system values	2015-10-21 18:39:06 -07:00
Jason Ekstrand	5790ee2bbb	nir/spirv: Add support for various barrier type instructions	2015-10-21 18:17:11 -07:00
Jason Ekstrand	3d35e4361f	Fix a couple of dereferences	2015-10-21 18:16:50 -07:00
Jason Ekstrand	55a7ee730c	spirv/nir: Add more stage asserts	2015-10-21 18:00:05 -07:00
Jason Ekstrand	27393c8630	nir/spirv: Add support for GS metadata	2015-10-21 17:58:34 -07:00
Jason Ekstrand	a8ffd6e72c	nir/gather_info: Add more info for geometry shaders	2015-10-21 17:42:47 -07:00
Jason Ekstrand	fed60e3c73	Merge remote-tracking branch 'mesa-public/master' into vulkan	2015-10-21 17:40:13 -07:00
Brian Paul	18a631eb90	svga: fix clip plane regression after recent tgsi_scan change Before the change "tgsi/scan: use properties for clip/cull distance writemasks", the tgsi_shader_info::num_written_clipdistance field was a multiple of four, now it's an accurate count. In the svga driver, we need a minor change to the loop test. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2015-10-21 17:12:19 -06:00
Kenneth Graunke	48c76eae8e	i965: Implement gl_InvocationID. It's stored in bits 31:27 of g1 (along with the URB handles). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-10-21 14:27:58 -07:00
Kenneth Graunke	c5ae34f38f	i965: Implement nir_intrinsic_load_primitive. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-10-21 14:27:56 -07:00
Kenneth Graunke	b3ebf03b84	i965: Add a fs_visitor constructor that takes a brw_gs_compile. Unlike the vs/wm structs, brw_gs_compile is actually useful: it contains the input VUE map and information about the control data headers. Passing this in allows us to share that code in brw_gs.c, and calculate them before deciding on vec4 vs. scalar mode, as it's independent of that choice. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-10-21 14:27:54 -07:00
Kenneth Graunke	55dfd39b5f	i965: Add a brw->scalar_gs flag controlled by INTEL_SCALAR_GS=1. This patch introduces a brw->scalar_gs flag, similar to brw->scalar_vs, which controls whether or not to use SIMD8 geometry shaders. For now, we control it via a new environment variable, INTEL_SCALAR_GS. This provides a convenient way to try it out. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-10-21 14:27:53 -07:00
Kenneth Graunke	ac0a33666b	i965: Make emit_urb_writes() reserve space for GS header information. Geometry shaders have additional header data at the beginning of their output URB entries. Shaders that use EndPrimitive() or multiple streams have a control data header; shaders with a dynamic vertex count have an additional vec4 slot to hold the 32-bit vertex count (and 96 bits of padding). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-10-21 14:27:52 -07:00
Kenneth Graunke	cb755996d9	i965: Make emit_urb_writes() only set EOT for the VS. The GS will emit a bunch of vertices, and we don't want to do an EOT prematurely. We'll emit GS_OPCODE_THREAD_END when we want to terminate the thread. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-10-21 14:27:50 -07:00
Kenneth Graunke	6ae419b94d	i965: Make fs_visitor::emit_urb_writes reusable for scalar GS. GS doesn't have ClampVertexColor, and we don't want to go through VS structures. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-10-21 14:27:49 -07:00
Kenneth Graunke	72d84ae7ce	i965: Introduce a brw_vue_prog_data::include_vue_handles flag. Tessellation shaders and SIMD8 geometry shaders may need to resort to the pull model for inputs at times. When set, the state upload code will tell the hardware to provide URB handles for input data. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-10-21 14:27:48 -07:00
Kenneth Graunke	ac98888afd	i965: Introduce a new SHADER_OPCODE_URB_READ_SIMD8 opcode. In scalar mode, geometry shader inputs can easily take up hundreds of registers. This makes pushing VUE entries impractical; we'll need to resort to the pull model in some cases. To support this, we introduce a new opcode corresponding to the "URB Read SIMD8" message. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-10-21 14:27:46 -07:00
Kenneth Graunke	bea7522782	i965: Introduce new SHADER_OPCODE_URB_WRITE_SIMD8_MASKED/PER_SLOT opcodes. In the vec4 backend, we have a vec4_instruction::urb_write_flags field. There are many kinds of flags for SIMD4x2 messages. However, there are really only two (per-slot offset, use channel masks) for SIMD8 messages. Rather than adding a boolean flag for per-slot offsets (polluting all instructions), I decided to just make three new opcodes. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-10-21 14:27:41 -07:00
Jason Ekstrand	0e57694745	i965/gs: Do prog_data setup and other calculations in brw_compile_gs This commit moves the large pile of setup calculations we have to do for geometry shaders out of brw_gs_emit and into brw_compile_gs. This has a couple of nice implications. First, it's less work that the caller of brw_compile_gs has to do. Second, it's consistent with the vertex and fragment stages. Finally, it allows us to put brw_gs_compile back behind the API boundary where it belongs. v2 (Jason Ekstrand): - Pull the changes to use nir info into a separate patch - Put brw_gs_compile into brw_shader.h rather than brw_vec4_gs_visitor.h so that we can use it for scalar GS. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-10-21 14:20:32 -07:00
Jason Ekstrand	f3bc73073a	i965/gs: Use NIR info for setting up prog_data Previously, we were pulling bits from GL data structures in order to set up the prog_data. However, in this brave new world of NIR, we want to be pulling it out of the NIR shader whenever possible. This way, we can move all this setup code into brw_compile_gs without depending on the old GL stuff. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-10-21 14:20:32 -07:00

... 129 130 131 132 133 ...

74545 Commits