Commit Graph

2357 Commits

Author SHA1 Message Date
Jason Ekstrand 80ddfab2f5 intel/cs: Rework the way thread local ID is handled
Previously, brw_nir_lower_intrinsics added the param and then emitted a
load_uniform intrinsic to load it directly.  This commit switches things
over to use a specific NIR intrinsic for the thread id.  The one thing I
don't like about this approach is that we have to copy thread_local_id
over to the new visitor in import_uniforms.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jason Ekstrand 25f7453c9e intel/fs: Mark 64-bit values as being contiguous
This isn't often a problem , when we're in a compute shader, we must
push the thread local ID so we decrement the amount of available push
space by 1 and it's no longer even and 64-bit data can, in theory, span
it.  By marking those uniforms contiguous, we ensure that they never get
split in half between push and pull constants.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-07 10:37:52 -08:00
Jason Ekstrand c4c8cba705 intel/cs: Ignore runtime_check_aads_emit for CS
It's only set on gen4-5 which clearly don't support compute shaders.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jason Ekstrand d4de813d86 intel/cs: Stop setting dispatch_grf_start_reg
Nothing ever reads it for compute shaders because it's always 1.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jason Ekstrand b1a9cdede4 intel/cs: Drop max_dispatch_width checks from compile_cs
The only things that adjust fs_visitor::max_dispatch_width are render
target writes which don't happen in compute shaders so they're
pointless.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jason Ekstrand 1077981eb5 intel/fs: Remove min_dispatch_width from fs_visitor
It's 8 for everything except compute shaders.  For compute shaders,
there's no need to duplicate the computation and it's just a possible
source of error.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jason Ekstrand b299ded02e intel/fs: use pull constant locations to check for first compile of a shader
Before, we bailing in assign_constant_locations based on the minimum
dispatch size.  The more direct thing to do is simply to check for
whether or not we have constant locations and bail if we do.  For
nir_setup_uniforms, it's completely safe to do it multiple times because
we just copy a value from the NIR shader.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jason Ekstrand 103081c9a9 intel/fs: Retype dest to match value in read[First]Invocation
This is what we really wanted all along.  Always retyping to D works
because that's what get_nir_src() always gives us, at least for 32-bit
types.  The SPIR-V variants of these operations accept arbitrary types
and we need this if we're going to handle 64 or 16-bit values.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jason Ekstrand ebaee9da4a intel/fs: Uniformize the index in readInvocation
The index is any value provided by the shader and this can be called in
non-uniform control flow so we can't just take component 0.  Found by
inspection.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jason Ekstrand b67230de63 intel/fs: Protect opt_algebraic from OOB BROADCAST indices
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jason Ekstrand aa4ff4b98c i965/fs/nir: Don't stomp 64-bit values to D in get_nir_src
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jason Ekstrand ec8c6649f1 i965/fs/nir: Minor refactor of store_output
Stop retyping the output of shuffle_64bit_data_for_32bit_write.  It's
always BRW_REGISTER_TYPE_D which is perfectly fine for writing out.
Also, when we change get_nir_src to return something with a 64-bit type
for 64-bit values, the retyping will not be at all what we want.  Also,
retyping the output based on src.type before we whack it back to 32 bits
is a problem because the output is always 32 bits.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jason Ekstrand 030d2b5016 i965/fs: Return a fs_reg from shuffle_64bit_data_for_32bit_write
All callers of this function allocate a fs_reg expressly to pass into
it.  It's much easier if we just let the helper allocate the register.
While we're here, we switch it to doing the MOVs with an integer type so
that we don't accidentally canonicalize floats on half of a double.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jason Ekstrand 6197a6b7ac i965/fs/nir: Simplify 64-bit store_output
The swizzles weren't doing any good because swiz is just XYZW.  Also, we
were emitting an extra set of MOVs because shuffle_64bit_data_for_32bit
already does a MOV for us.  Finally, the temporary was only ever used
inside the inner loop so there's no need for it to actually be an array.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jason Ekstrand 18fde36ced intel/fs: Use the original destination region for int MUL lowering
Some hardware (CHV, BXT) have special restrictions on register regions
when doing integer multiplication.  We want to respect those when we
lower to DxW multiplication.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-07 10:37:52 -08:00
Jason Ekstrand d54f8ec744 intel/fs: Fix integer multiplication lowering for src/dst hazards
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-07 10:37:52 -08:00
Jason Ekstrand fd1bcccc2d intel/fs: Fix MOV_INDIRECT for 64-bit values on little-core
The same workaround we need for 64-bit values on little core also takes
care of the Ivy Bridge problem and does so a bit more efficiently so we
can drop that code while we're here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-07 10:37:52 -08:00
Jason Ekstrand 6041a31e77 intel/eu: Fix broadcast instruction for 64-bit values on little-core
We're not using broadcast for any 32-bit types right now since we mostly
use it for emit_uniformize on 32-bit buffer indices.  However, SPIR-V
subgroups are going to need it for 64-bit so let's make it work.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jason Ekstrand 10e4feed39 intel/eu/reg: Add a subscript() helper
This is similar to the identically named fs_reg helper.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-07 10:37:52 -08:00
Jason Ekstrand 068beb41d8 intel/eu: Just modify the offset in brw_broadcast
This means we have to drop const from a variable but it also means that
100% of the code which deals with the offset limit is in one place.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jason Ekstrand e3bcc86133 intel/compiler: Add some restrictions to MOV_INDIRECT and BROADCAST
These restrictions effectively already existed due to the way we use
indirect sources but weren't being directly enforced.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jason Ekstrand 1b8ef49f48 intel/fs: Use a pair of 1-wide MOVs instead of SEL for any/all
For some reason, the any/all predicates don't work properly with SIMD32.
In particular, it appears that a SEL with a QtrCtrl of 2H doesn't read
the correct subset of the flag register and you end up getting garbage
in the second half.  Work around this by using a pair of 1-wide MOVs and
scattering the result.  This fixes the any/all instructions for SIMD32.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-07 10:37:52 -08:00
Jason Ekstrand 1f41663007 intel/fs: Use an explicit D type for vote any/all/eq intrinsics
The any/all intrinsics return a boolean value so D or UD is the correct
type.  Unfortunately, get_nir_dest has the annoying behavior of
returnning a float type by default.  This causes format conversion which
gives us -1.0f or 0.0f in the register.  If the consumer of the result
does an integer comparison to zero, it will give you the right boolean
value but if we do something more clever based on the 0/~0 assumption
for booleans, this will give the wrong value.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-07 10:37:52 -08:00
Jason Ekstrand 6c00240bc6 intel/fs: Don't stomp f0.1 in SIMD16 ballot
In fragment shaders f0.1 is used for discards so doing ballot after a
discard can potentially cause the discard to not happen.  However, we
don't support SIMD32 fragment shaders yet so this isn't a problem.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-07 10:37:52 -08:00
Jason Ekstrand def013a863 intel/fs: Use ANY/ALL32 predicates in SIMD32
We have ANY/ALL32 predicates and, for the most part, they work just
fine.  (See the next commit for more details.)  Also, due to the way
that flag registers are handled in hardware, instruction splitting is
able to split the CMP correctly.  Specifically, that hardware looks at
the execution group and knows to shift it's flag usage up correctly so a
2H instruction will write to f0.1 instead of f0.0.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-07 10:37:52 -08:00
Jason Ekstrand 0d905597fe intel/fs: Be more explicit about our placement of [un]zip
Before, we were careful to place the zip after the last of the split
instructions but did unzip on-demand.  This changes things so that the
unzips go before all of the split instructions and the unzip comes
explicitly after all the split instructions.  As a side-effect of this
change, we now emit the split instruction from highest SIMD group to
lowest instead of low to high.  We could have kept the old behavior, but
it shouldn't matter and this made the code easier.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-07 10:37:52 -08:00
Jason Ekstrand fcd4adb9d0 intel/fs: Pass builders instead of blocks into emit_[un]zip
This makes it far more explicit where we're inserting the instructions
rather than the magic "before and after" stuff that the emit_[un]zip
helpers did based on block and inst.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-07 10:37:52 -08:00
Jason Ekstrand e8c9e65185 intel/fs: Use a pure vertical stride for large register strides
Register strides higher than 4 are uncommon but they can happen.  For
instance, if you have a 64-bit extract_u8 operation, we turn that into
UB -> UQ MOV with a source stride of 8.  Our previous calculation would
try to generate a stride of <32;8,8>:ub which is invalid because the
maximum horizontal stride is 4.  To solve this problem, we instead use a
stride of <8;1,0>.  As noted in the comment, this does not work as a
destination but that's ok as very few things actually generate that
stride.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-07 10:37:52 -08:00
Chad Versace 3ea37d0a2a anv: Suffix anv-private 'VK' tokens with 'ANV'
I saw VK_IMAGE_ASPECT_ANY_COLOR_BIT while hacking anv_formats.c and got
confused. "Huh? What extension added that?". No extension defines it;
anv_private.h defines it.

To remove confusion, rename the anv-private VK tokens as if they were
extension tokens with the ANV vendor suffix.

I found only two such tokens:

    VK_IMAGE_ASPECT_ANY_COLOR_BIT
    VK_IMAGE_ASPECT_PLANES_BITS

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-07 09:06:41 -08:00
Chad Versace 012b54c6b1 anv: Remove unused variable 'gen'
In anv_physical_device_get_format_properties().

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-07 09:06:30 -08:00
Jason Ekstrand 172e8e42c4 intel/fs: Don't allocate a param array for zero push constants
Thanks to the ralloc invariant of "any pointer returned from ralloc can
be used as a context", calling ralloc_size with a size of zero will
cause it to allocate at least a header.  If we don't have any push
constants, then NULL is perfectly acceptable (and even preferred).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-11-02 09:55:21 -07:00
Jason Ekstrand 7b4387519c intel/fs: Alloc pull constants off mem_ctx
It doesn't actually matter since the only user of push constants, i965,
ralloc_steals it back to NULL but it's more consistent and probably
fixes memory leaks in some error cases.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-02 09:55:21 -07:00
Lionel Landwerlin 8d8b9d11c9 intel: decoder: enable decoding a single field
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-11-01 17:23:49 +00:00
Lionel Landwerlin bb16503542 intel: decoder: expose missing find_enum()
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-11-01 17:23:49 +00:00
Lionel Landwerlin ad876f721e intel: decoder: extract field value computation
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-11-01 17:23:49 +00:00
Lionel Landwerlin 81aee9fd4b intel: decoder: rename field() to field_value()
We would like to avoid collisions with variables named field.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-11-01 17:23:49 +00:00
Lionel Landwerlin 69d158573a intel: decoder: rename internal function to free name
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-11-01 17:23:49 +00:00
Lionel Landwerlin 20156931bf intel: decoder: simplify field_is_header()
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-11-01 17:23:49 +00:00
Lionel Landwerlin cab93a901e intel: common: make intel utils available from C++
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-11-01 17:23:49 +00:00
Lionel Landwerlin ea14ba0179 intel: decoder: remove unused platform field
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-11-01 17:23:49 +00:00
Lionel Landwerlin 938f62a1c7 intel: error-decode: implement a rolling window of programs
If we have more programs than what we can store,
aubinator_error_decode will assert. Instead let's have a rolling
window of programs.

v2: Fix overflowing issues (Eric Engestrom)

v3: Go through programs starting at idx_program (Scott)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-11-01 17:23:49 +00:00
Lionel Landwerlin 38f338c19a intel: decoder: extract instruction/structs length
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-11-01 13:49:12 +00:00
Lionel Landwerlin 279531672e intel: decoder: pack iterator variable declarations
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-11-01 13:49:12 +00:00
Lionel Landwerlin 1cf1591abd intel: decoder: simplify creation of struct when 0-allocated
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-11-01 13:49:12 +00:00
Lionel Landwerlin eb00b8b18c intel: decoder: add destructor for gen_spec
This makes use of ralloc to simplify the destruction. We can also
store instructions in hash tables.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-11-01 13:49:12 +00:00
Lionel Landwerlin de213b4af8 intel: decoder: expose helper to test header fields
These fields are of little importance as they're used to recognize
instructions.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-11-01 13:19:20 +00:00
Lionel Landwerlin 68e1853ea3 intel: decoder: don't read qword outside instruction/struct limit
We used to print invalid data when the last field was being clamped to
32bits due to Dword Length of the whole instruction. Here is an
example where the decoder read part of the next instruction instead of
stopping at the 32bit limit:

0x000ce0b4:  0x10000002:  MI_STORE_DATA_IMM
0x000ce0b4:  0x10000002 : Dword 0
    DWord Length: 2
    Store Qword: 0
    Use Global GTT: false
0x000ce0b8:  0x00045010 : Dword 1
    Core Mode Enable: 0
    Address: 0x00045010
0x000ce0bc:  0x00000000 : Dword 2
0x000ce0c0:  0x00000000 : Dword 3
    Immediate Data: 8791026489807077376

With this change we have the proper value :

0x000ce0b4:  0x10000002:  MI_STORE_DATA_IMM (4 Dwords)
0x000ce0b4:  0x10000002 : Dword 0
    DWord Length: 2
    Store Qword: 0
    Use Global GTT: false
0x000ce0b8:  0x00045010 : Dword 1
    Core Mode Enable: 0
    Address: 0x00045010
0x000ce0bc:  0x00000000 : Dword 2
0x000ce0c0:  0x00000000 : Dword 3
    Immediate Data: 0

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-11-01 13:19:20 +00:00
Lionel Landwerlin f5e5ca1e21 intel: decoder: split out getting the next field and decoding it
Due to the new way we handle fields, we need *not* to forget the first
field when decoding instructions. The issue was that the advance
function was called first and skipped the first field.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-11-01 13:19:20 +00:00
Lionel Landwerlin ffa011d1e3 intel: decoder: move field name copy
This should be inside the function that actually decodes fields.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-11-01 13:19:20 +00:00
Lionel Landwerlin 0698318d1a intel: decoder: reorder iterator init function
Making the next change more readable.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-11-01 13:19:20 +00:00