AlexIndustrial/mesa

Author	SHA1	Message	Date
Jason Ekstrand	80ddfab2f5	intel/cs: Rework the way thread local ID is handled Previously, brw_nir_lower_intrinsics added the param and then emitted a load_uniform intrinsic to load it directly. This commit switches things over to use a specific NIR intrinsic for the thread id. The one thing I don't like about this approach is that we have to copy thread_local_id over to the new visitor in import_uniforms. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	25f7453c9e	intel/fs: Mark 64-bit values as being contiguous This isn't often a problem , when we're in a compute shader, we must push the thread local ID so we decrement the amount of available push space by 1 and it's no longer even and 64-bit data can, in theory, span it. By marking those uniforms contiguous, we ensure that they never get split in half between push and pull constants. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	c4c8cba705	intel/cs: Ignore runtime_check_aads_emit for CS It's only set on gen4-5 which clearly don't support compute shaders. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	d4de813d86	intel/cs: Stop setting dispatch_grf_start_reg Nothing ever reads it for compute shaders because it's always 1. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	b1a9cdede4	intel/cs: Drop max_dispatch_width checks from compile_cs The only things that adjust fs_visitor::max_dispatch_width are render target writes which don't happen in compute shaders so they're pointless. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	1077981eb5	intel/fs: Remove min_dispatch_width from fs_visitor It's 8 for everything except compute shaders. For compute shaders, there's no need to duplicate the computation and it's just a possible source of error. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	b299ded02e	intel/fs: use pull constant locations to check for first compile of a shader Before, we bailing in assign_constant_locations based on the minimum dispatch size. The more direct thing to do is simply to check for whether or not we have constant locations and bail if we do. For nir_setup_uniforms, it's completely safe to do it multiple times because we just copy a value from the NIR shader. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	103081c9a9	intel/fs: Retype dest to match value in read[First]Invocation This is what we really wanted all along. Always retyping to D works because that's what get_nir_src() always gives us, at least for 32-bit types. The SPIR-V variants of these operations accept arbitrary types and we need this if we're going to handle 64 or 16-bit values. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	ebaee9da4a	intel/fs: Uniformize the index in readInvocation The index is any value provided by the shader and this can be called in non-uniform control flow so we can't just take component 0. Found by inspection. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	b67230de63	intel/fs: Protect opt_algebraic from OOB BROADCAST indices Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	aa4ff4b98c	i965/fs/nir: Don't stomp 64-bit values to D in get_nir_src Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	ec8c6649f1	i965/fs/nir: Minor refactor of store_output Stop retyping the output of shuffle_64bit_data_for_32bit_write. It's always BRW_REGISTER_TYPE_D which is perfectly fine for writing out. Also, when we change get_nir_src to return something with a 64-bit type for 64-bit values, the retyping will not be at all what we want. Also, retyping the output based on src.type before we whack it back to 32 bits is a problem because the output is always 32 bits. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	030d2b5016	i965/fs: Return a fs_reg from shuffle_64bit_data_for_32bit_write All callers of this function allocate a fs_reg expressly to pass into it. It's much easier if we just let the helper allocate the register. While we're here, we switch it to doing the MOVs with an integer type so that we don't accidentally canonicalize floats on half of a double. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	6197a6b7ac	i965/fs/nir: Simplify 64-bit store_output The swizzles weren't doing any good because swiz is just XYZW. Also, we were emitting an extra set of MOVs because shuffle_64bit_data_for_32bit already does a MOV for us. Finally, the temporary was only ever used inside the inner loop so there's no need for it to actually be an array. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	18fde36ced	intel/fs: Use the original destination region for int MUL lowering Some hardware (CHV, BXT) have special restrictions on register regions when doing integer multiplication. We want to respect those when we lower to DxW multiplication. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	d54f8ec744	intel/fs: Fix integer multiplication lowering for src/dst hazards Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	fd1bcccc2d	intel/fs: Fix MOV_INDIRECT for 64-bit values on little-core The same workaround we need for 64-bit values on little core also takes care of the Ivy Bridge problem and does so a bit more efficiently so we can drop that code while we're here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	6041a31e77	intel/eu: Fix broadcast instruction for 64-bit values on little-core We're not using broadcast for any 32-bit types right now since we mostly use it for emit_uniformize on 32-bit buffer indices. However, SPIR-V subgroups are going to need it for 64-bit so let's make it work. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	10e4feed39	intel/eu/reg: Add a subscript() helper This is similar to the identically named fs_reg helper. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	068beb41d8	intel/eu: Just modify the offset in brw_broadcast This means we have to drop const from a variable but it also means that 100% of the code which deals with the offset limit is in one place. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	e3bcc86133	intel/compiler: Add some restrictions to MOV_INDIRECT and BROADCAST These restrictions effectively already existed due to the way we use indirect sources but weren't being directly enforced. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	1b8ef49f48	intel/fs: Use a pair of 1-wide MOVs instead of SEL for any/all For some reason, the any/all predicates don't work properly with SIMD32. In particular, it appears that a SEL with a QtrCtrl of 2H doesn't read the correct subset of the flag register and you end up getting garbage in the second half. Work around this by using a pair of 1-wide MOVs and scattering the result. This fixes the any/all instructions for SIMD32. Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	1f41663007	intel/fs: Use an explicit D type for vote any/all/eq intrinsics The any/all intrinsics return a boolean value so D or UD is the correct type. Unfortunately, get_nir_dest has the annoying behavior of returnning a float type by default. This causes format conversion which gives us -1.0f or 0.0f in the register. If the consumer of the result does an integer comparison to zero, it will give you the right boolean value but if we do something more clever based on the 0/~0 assumption for booleans, this will give the wrong value. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	6c00240bc6	intel/fs: Don't stomp f0.1 in SIMD16 ballot In fragment shaders f0.1 is used for discards so doing ballot after a discard can potentially cause the discard to not happen. However, we don't support SIMD32 fragment shaders yet so this isn't a problem. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	def013a863	intel/fs: Use ANY/ALL32 predicates in SIMD32 We have ANY/ALL32 predicates and, for the most part, they work just fine. (See the next commit for more details.) Also, due to the way that flag registers are handled in hardware, instruction splitting is able to split the CMP correctly. Specifically, that hardware looks at the execution group and knows to shift it's flag usage up correctly so a 2H instruction will write to f0.1 instead of f0.0. Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	0d905597fe	intel/fs: Be more explicit about our placement of [un]zip Before, we were careful to place the zip after the last of the split instructions but did unzip on-demand. This changes things so that the unzips go before all of the split instructions and the unzip comes explicitly after all the split instructions. As a side-effect of this change, we now emit the split instruction from highest SIMD group to lowest instead of low to high. We could have kept the old behavior, but it shouldn't matter and this made the code easier. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	fcd4adb9d0	intel/fs: Pass builders instead of blocks into emit_[un]zip This makes it far more explicit where we're inserting the instructions rather than the magic "before and after" stuff that the emit_[un]zip helpers did based on block and inst. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	e8c9e65185	intel/fs: Use a pure vertical stride for large register strides Register strides higher than 4 are uncommon but they can happen. For instance, if you have a 64-bit extract_u8 operation, we turn that into UB -> UQ MOV with a source stride of 8. Our previous calculation would try to generate a stride of <32;8,8>:ub which is invalid because the maximum horizontal stride is 4. To solve this problem, we instead use a stride of <8;1,0>. As noted in the comment, this does not work as a destination but that's ok as very few things actually generate that stride. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Chad Versace	3ea37d0a2a	anv: Suffix anv-private 'VK' tokens with 'ANV' I saw VK_IMAGE_ASPECT_ANY_COLOR_BIT while hacking anv_formats.c and got confused. "Huh? What extension added that?". No extension defines it; anv_private.h defines it. To remove confusion, rename the anv-private VK tokens as if they were extension tokens with the ANV vendor suffix. I found only two such tokens: VK_IMAGE_ASPECT_ANY_COLOR_BIT VK_IMAGE_ASPECT_PLANES_BITS Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-07 09:06:41 -08:00
Chad Versace	012b54c6b1	anv: Remove unused variable 'gen' In anv_physical_device_get_format_properties(). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-07 09:06:30 -08:00
Jason Ekstrand	172e8e42c4	intel/fs: Don't allocate a param array for zero push constants Thanks to the ralloc invariant of "any pointer returned from ralloc can be used as a context", calling ralloc_size with a size of zero will cause it to allocate at least a header. If we don't have any push constants, then NULL is perfectly acceptable (and even preferred). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2017-11-02 09:55:21 -07:00
Jason Ekstrand	7b4387519c	intel/fs: Alloc pull constants off mem_ctx It doesn't actually matter since the only user of push constants, i965, ralloc_steals it back to NULL but it's more consistent and probably fixes memory leaks in some error cases. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-02 09:55:21 -07:00
Lionel Landwerlin	8d8b9d11c9	intel: decoder: enable decoding a single field Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-11-01 17:23:49 +00:00
Lionel Landwerlin	bb16503542	intel: decoder: expose missing find_enum() Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-11-01 17:23:49 +00:00
Lionel Landwerlin	ad876f721e	intel: decoder: extract field value computation Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-11-01 17:23:49 +00:00
Lionel Landwerlin	81aee9fd4b	intel: decoder: rename field() to field_value() We would like to avoid collisions with variables named field. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-11-01 17:23:49 +00:00
Lionel Landwerlin	69d158573a	intel: decoder: rename internal function to free name Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-11-01 17:23:49 +00:00
Lionel Landwerlin	20156931bf	intel: decoder: simplify field_is_header() Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-11-01 17:23:49 +00:00
Lionel Landwerlin	cab93a901e	intel: common: make intel utils available from C++ Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-11-01 17:23:49 +00:00
Lionel Landwerlin	ea14ba0179	intel: decoder: remove unused platform field Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-11-01 17:23:49 +00:00
Lionel Landwerlin	938f62a1c7	intel: error-decode: implement a rolling window of programs If we have more programs than what we can store, aubinator_error_decode will assert. Instead let's have a rolling window of programs. v2: Fix overflowing issues (Eric Engestrom) v3: Go through programs starting at idx_program (Scott) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-11-01 17:23:49 +00:00
Lionel Landwerlin	38f338c19a	intel: decoder: extract instruction/structs length Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-11-01 13:49:12 +00:00
Lionel Landwerlin	279531672e	intel: decoder: pack iterator variable declarations Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-11-01 13:49:12 +00:00
Lionel Landwerlin	1cf1591abd	intel: decoder: simplify creation of struct when 0-allocated Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-11-01 13:49:12 +00:00
Lionel Landwerlin	eb00b8b18c	intel: decoder: add destructor for gen_spec This makes use of ralloc to simplify the destruction. We can also store instructions in hash tables. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-11-01 13:49:12 +00:00
Lionel Landwerlin	de213b4af8	intel: decoder: expose helper to test header fields These fields are of little importance as they're used to recognize instructions. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-11-01 13:19:20 +00:00
Lionel Landwerlin	68e1853ea3	intel: decoder: don't read qword outside instruction/struct limit We used to print invalid data when the last field was being clamped to 32bits due to Dword Length of the whole instruction. Here is an example where the decoder read part of the next instruction instead of stopping at the 32bit limit: 0x000ce0b4: 0x10000002: MI_STORE_DATA_IMM 0x000ce0b4: 0x10000002 : Dword 0 DWord Length: 2 Store Qword: 0 Use Global GTT: false 0x000ce0b8: 0x00045010 : Dword 1 Core Mode Enable: 0 Address: 0x00045010 0x000ce0bc: 0x00000000 : Dword 2 0x000ce0c0: 0x00000000 : Dword 3 Immediate Data: 8791026489807077376 With this change we have the proper value : 0x000ce0b4: 0x10000002: MI_STORE_DATA_IMM (4 Dwords) 0x000ce0b4: 0x10000002 : Dword 0 DWord Length: 2 Store Qword: 0 Use Global GTT: false 0x000ce0b8: 0x00045010 : Dword 1 Core Mode Enable: 0 Address: 0x00045010 0x000ce0bc: 0x00000000 : Dword 2 0x000ce0c0: 0x00000000 : Dword 3 Immediate Data: 0 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-11-01 13:19:20 +00:00
Lionel Landwerlin	f5e5ca1e21	intel: decoder: split out getting the next field and decoding it Due to the new way we handle fields, we need not to forget the first field when decoding instructions. The issue was that the advance function was called first and skipped the first field. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-11-01 13:19:20 +00:00
Lionel Landwerlin	ffa011d1e3	intel: decoder: move field name copy This should be inside the function that actually decodes fields. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-11-01 13:19:20 +00:00
Lionel Landwerlin	0698318d1a	intel: decoder: reorder iterator init function Making the next change more readable. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-11-01 13:19:20 +00:00

1 2 3 4 5 ...

2357 Commits