Commit Graph

3923 Commits

Author SHA1 Message Date
Jason Ekstrand 8993e0973f intel/nir: Drop an unneeded lower_constant_initializers call
Even though this is technically a step in the function inlining process
as laid out in nir_inline_functions.c, it's not really needed.  We
already have constant initializers lowered here and no new ones are
added by appending the softfp64 functions.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand fa4824c1db intel/debug: Add a debug flag to force software fp64
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Ian Romanick 55e6454d5e intel/fs: Fix extract_u8 of an odd byte from a 64-bit integer
In the old code, we would generate the exact same instruction for
extract_u8(some_u64, 0) and extract_u8(some_u64, 1).  The mask-a-word
trick only works for even numbered bytes.

This fixes the (new) piglit test
tests/spec/arb_gpu_shader_int64/execution/fs-ushr-and-mask.shader_test.

v2: Use a SHR instead of an AND.  This saves an instruction compared to
using two moves.  Suggested by Jason.

Fixes: 6ac2d16901 ("i965/fs: Fix extract_i8/u8 to a 64-bit destination")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-06 08:35:45 -08:00
Ian Romanick 4aaf139ea4 intel/fs: nir_op_extract_i8 extracts a byte, not a word
Fixes: 6ac2d16901 ("i965/fs: Fix extract_i8/u8 to a 64-bit destination")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-06 08:35:42 -08:00
Ian Romanick bbf20a1ca3 intel/compiler: Silence unused parameter warning in brw_interpolation_map.c
The parameter is never used, and it's not part of a common interface
idiom.  Remove it.

src/intel/compiler/brw_interpolation_map.c: In function ‘brw_setup_vue_interpolation’:
src/intel/compiler/brw_interpolation_map.c:62:59: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
                             const struct gen_device_info *devinfo)
                                                           ^~~~~~~

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-06 08:35:36 -08:00
Ian Romanick dea19138dd intel/compiler: Silence many unused parameter warnings in brw_eu.h
In file included from src/intel/compiler/brw_eu_util.c:34:0:
src/intel/compiler/brw_eu.h: In function ‘brw_message_desc_header_present’:
src/intel/compiler/brw_eu.h:288:63: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_message_desc_header_present(const struct gen_device_info *devinfo,
                                                               ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_message_ex_desc’:
src/intel/compiler/brw_eu.h:296:51: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_message_ex_desc(const struct gen_device_info *devinfo,
                                                   ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_message_ex_desc_ex_mlen’:
src/intel/compiler/brw_eu.h:303:59: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_message_ex_desc_ex_mlen(const struct gen_device_info *devinfo,
                                                           ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_binding_table_index’:
src/intel/compiler/brw_eu.h:337:68: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_sampler_desc_binding_table_index(const struct gen_device_info *devinfo,
                                                                    ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_sampler’:
src/intel/compiler/brw_eu.h:344:56: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_sampler_desc_sampler(const struct gen_device_info *devinfo, uint32_t desc)
                                                        ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_return_format’:
src/intel/compiler/brw_eu.h:371:62: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_sampler_desc_return_format(const struct gen_device_info *devinfo,
                                                              ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_dp_desc_binding_table_index’:
src/intel/compiler/brw_eu.h:405:63: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_dp_desc_binding_table_index(const struct gen_device_info *devinfo,
                                                               ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_dp_a64_untyped_atomic_desc’:
src/intel/compiler/brw_eu.h:754:41: warning: unused parameter ‘exec_size’ [-Wunused-parameter]
                                unsigned exec_size, /**< 0 for SIMD4x2 */
                                         ^~~~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_dp_a64_untyped_atomic_float_desc’:
src/intel/compiler/brw_eu.h:775:47: warning: unused parameter ‘exec_size’ [-Wunused-parameter]
                                      unsigned exec_size,
                                               ^~~~~~~~~

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-06 08:35:31 -08:00
Timothy Arceri 81ee2cd8ba glsl: rename is_record() -> is_struct()
Replace was done using:
find ./src -type f -exec sed -i -- \
's/is_record(/is_struct(/g' {} \;

Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 13:10:02 +11:00
Caio Marcelo de Oliveira Filho 69cc6272fb anv: Implement VK_EXT_external_memory_host
v2: Ignore the import if handleType == 0. (Jason)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-05 12:59:50 -08:00
Jason Ekstrand 43f40dc7cb anv: Implement VK_EXT_inline_uniform_block
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand 61e009d2c4 spirv: Use the same types for resource indices as pointers
We need more space than just a 32-bit scalar and we have to burn all
that space anyway so we may as well expose it to the driver.  This also
fixes a subtle bug when UBOs and SSBOs have different pointer types.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand c520f4dec9 anv: Add a concept of a descriptor buffer
This buffer goes along side the CPU data structure and may contain
pointers, bindless handles, or any other descriptor information.
Currently, all descriptors are size zero and nothing goes in the buffer
but this commit sets up the framework we will need later.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand 5c30fffeec anv: Take references to push descriptor set layouts
Technically, descriptor set layouts aren't required to survive past the
function they're passed into so we need to reference them.

Cc: "19.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand 8ab95b849e anv: Refactor descriptor pushing a bit
Pull the common code out of the two entrypoints into the helper which
fetches the push descriptor set for us.  Now that it does more than just
get a thing, call it anv_cmd_buffer_push_descriptor_set.

Cc: "19.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand cab064bc10 anv: drop add_var_binding from anv_nir_apply_pipeline_layout.c
It has exactly one caller.  Just inline it.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand 49cf61c6aa anv: Clean up descriptor set layouts
The descriptor set layout code in our driver has undergone many changes
over the years.  Some of the fields which were once essential are now
useless or nearly so.  The has_dynamic_offsets field was completely
unused accept for the code to set and hash it.  The per-stage indices
were only being used to determine if a particular binding had images,
samplers, etc.  The fact that it's per-stage also doesn't matter because
that binding should never be accessed by a shader of the wrong stage.

This commit deletes a pile of cruft and replaces it all with a
descriptive bitfield which states what a particular descriptor contains.
This merely describes the data available and doesn't necessarily dictate
how it will be lowered in anv_nir_apply_pipeline_layout.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand 4c50b7c92c anv: Count image param entries rather than images
This is what we're actually storing in the descriptor set and consuming
when we bind surface states.  This commit renames image_count to
image_param_count a few places and moves the decision to not count image
params on gen9+ into anv_descriptor_set.c when we build the layout.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand 3822c7495a anv: Stop allocating buffer views for dynamic buffers
We emit the surface states for those on-the-fly so we don't need the
buffer view.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand 8c6d410a50 anv: Rework arguments to anv_descriptor_set_write_*
Make them all take a device followed by a set.  This is consistent
with how the actual Vulkan entrypoint parameters are laid out.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand 5b7a9e7398 anv/descriptor_set: Refactor alloc/free of descriptor sets
This commit just puts the free list code together as part of the pool
instead of having it inlined into the descriptor set create code.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Eric Engestrom 3d4238d26c anv: use the platform defines in vk.xml instead of hard-coding them
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-05 11:57:10 +00:00
Lionel Landwerlin e21c201c96 anv: update supported patch version
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-03-05 10:39:17 +00:00
Tapani Pälli 3bb8768b9d anv: toggle on support for VK_EXT_ycbcr_image_arrays
We already propagate coord_components correctly and did not have
layer restrictions for ycbcr formats.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:39:17 +00:00
Tapani Pälli 33bf3d510c anv: retain the is_array state in create_plane_tex_instr_implicit
This does not seem to fix anything ATM but is the right thing todo.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Fixes: f3e91e78a3 ("anv: add nir lowering pass for ycbcr textures")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:38:31 +00:00
Jason Ekstrand 0010d0348a anv/pipeline: Drop anv_fill_binding_table
We zero out the prog data anyway and, now that bias is always zero, this
function is accomplishing nothing.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-04 23:56:40 +00:00
Jason Ekstrand 65ee5cc0da anv: Use an actual binding for gl_NumWorkgroups
This commit moves our handling of gl_NumWorkgroups over to work like our
handling of other special bindings in the Vulkan driver.  We give it a
magic descriptor set number and teach emit_binding_tables to handle it.
This is better than the bias mechanism we were using because it allows
us to do proper accounting through the bind map mechanism.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-04 23:56:40 +00:00
Jason Ekstrand 5c96120b5c intel,nir: Lower TXD with min_lod when the sampler index is not < 16
When we have a larger sampler index, we get into the "high sampler"
scenario and need an instruction header.  Even in SIMD8, this pushes the
instruction over the sampler message size maximum of 11 registers.
Instead, we have to lower TXD to TXL.

Fixes: cb98e0755f "intel/fs: Support min_lod parameters on texture..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-04 23:56:39 +00:00
Jason Ekstrand 5049fbddb4 anv: Count surfaces for non-YCbCr images in GetDescriptorSetLayoutSupport
We were accidentally not counting those surfaces

Fixes: ddc4069122 "anv: Implement VK_KHR_maintenance3"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-04 23:56:39 +00:00
Sagar Ghuge e551040c60 nir/glsl: Add another way of doing lower_imul64 for gen8+
On Gen 8 and 9, "mul" instruction supports 64 bit destination type. We
can reduce our 64x64 int multiplication from 4 instructions to 3.

Also instead of emitting two mul instructions, we can emit single mul
instuction and extract low/high 32 bits from 64 bit result for
[i/u]mulExtended

v2: 1) Allow lower_mul_high64 to use new opcode (Jason Ekstrand)
    2) Add lower_mul_2x32_64 flag (Matt Turner)
    3) Remove associative property as bit size is different (Connor
       Abbott)

v3: Fix indentation and variable naming convention (Jason Ekstrand)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-04 15:50:25 -08:00
Mauro Rossi ec0f465bc5 android: anv: fix libexpat shared dependency
Fixes undefined reference building errors for XML_* functions

Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "19.0" <mesa-stable@lists.freedesktop.org>
2019-03-04 20:53:59 +01:00
Mauro Rossi 14e7e26a09 android: anv: fix generated files depedencies (v2)
Fix anv_extrypoints.{c,h} and anv_extensions.{c,h} missing dependencies
Rename the variable labels according to targets and python scripts
Align the building rules as per Automake for simplification

Fixes building errors during rebuils due to missing dependencies

(v2) Fixed a missing $(VULKAN_API_XML) reference

Fixes: 9a508b7 ("android: anv/extensions: fix generated sources build")
Fixes: dd088d4bec ("anv/extensions: Generate a header file with extension tables")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Cc: "19.0" <mesa-stable@lists.freedesktop.org>
2019-03-04 20:53:51 +01:00
Jordan Justen 10c5579921 intel/compiler: Move int64/doubles lowering options
Instead of calculating the int64 and doubles lowering options each
time a shader is preprocessed, save and use the values in
nir_shader_compiler_options.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-02 14:33:44 -08:00
Ian Romanick d1d56f5f9a intel/fs: Don't assert on b2f with a saturate modifier
This ran afoul of Iris's use of nir_lower_clamp_color_outputs which
applies fsat() before writes to vertex shader color outpus.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: 7725d60938 ("intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))")
2019-03-02 13:58:50 -08:00
Lionel Landwerlin 32ffd90002 anv: add support for INTEL_DEBUG=bat
As requested by Ken ;)

v2: Also decode simple batches (Caio)
    Fix u_vector usage issues (Lionel)

v3: Make binding/instruction/state/surface available (Lionel)

v4: Going through device pools for simple batches (Lionel)
    Centralize search BO callbacks into anv_device.c (Lionel)

v5: Clear decoded batch buffer var after use (Caio)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-02 12:53:21 +00:00
Matt Turner e0148bbcfd intel/compiler: Add commas on final values of compaction table arrays
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-03-01 13:56:25 -08:00
Ian Romanick 1edf67fc3f intel/fs: Generate if instructions with inverted conditions
Per-platform results were all over the place, so I have included all the
results here.  There is an important note at the bottom of the commit
message.

Skylake
total instructions in shared programs: 15184683 -> 15184679 (<.01%)
instructions in affected programs: 2786 -> 2782 (-0.14%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.05% max: 0.84% x̄: 0.44% x̃: 0.44%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.96% 0.07%
Inconclusive result (%-change mean confidence interval includes 0).

total cycles in shared programs: 370961367 -> 370961173 (<.01%)
cycles in affected programs: 205867 -> 205673 (-0.09%)
helped: 5
HURT: 1
helped stats (abs) min: 1 max: 149 x̄: 39.60 x̃: 16
helped stats (rel) min: 0.02% max: 1.05% x̄: 0.45% x̃: 0.55%
HURT stats (abs)   min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel)   min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -93.01 28.34
95% mean confidence interval for cycles %-change: -0.82% 0.08%
Inconclusive result (value mean confidence interval includes 0).

Broadwell
total instructions in shared programs: 15465366 -> 15465362 (<.01%)
instructions in affected programs: 2799 -> 2795 (-0.14%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.04% max: 0.84% x̄: 0.44% x̃: 0.44%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.96% 0.07%
Inconclusive result (%-change mean confidence interval includes 0).

total cycles in shared programs: 410938419 -> 410938531 (<.01%)
cycles in affected programs: 566028 -> 566140 (0.02%)
helped: 18
HURT: 17
helped stats (abs) min: 1 max: 16 x̄: 3.50 x̃: 1
helped stats (rel) min: <.01% max: 1.05% x̄: 0.13% x̃: <.01%
HURT stats (abs)   min: 1 max: 12 x̄: 10.29 x̃: 12
HURT stats (rel)   min: <.01% max: 0.16% x̄: 0.08% x̃: 0.09%
95% mean confidence interval for cycles value: 0.31 6.09
95% mean confidence interval for cycles %-change: -0.10% 0.05%
Inconclusive result (%-change mean confidence interval includes 0).

Haswell
total instructions in shared programs: 13749760 -> 13749759 (<.01%)
instructions in affected programs: 2241 -> 2240 (-0.04%)
helped: 1
HURT: 0

total cycles in shared programs: 385398913 -> 385398363 (<.01%)
cycles in affected programs: 554914 -> 554364 (-0.10%)
helped: 31
HURT: 1
helped stats (abs) min: 1 max: 453 x̄: 18.00 x̃: 6
helped stats (rel) min: <.01% max: 0.25% x̄: 0.03% x̃: 0.05%
HURT stats (abs)   min: 8 max: 8 x̄: 8.00 x̃: 8
HURT stats (rel)   min: 0.06% max: 0.06% x̄: 0.06% x̃: 0.06%
95% mean confidence interval for cycles value: -45.88 11.51
95% mean confidence interval for cycles %-change: -0.05% -0.02%
Inconclusive result (value mean confidence interval includes 0).

Ivy Bridge
total cycles in shared programs: 180663626 -> 180663881 (<.01%)
cycles in affected programs: 472350 -> 472605 (0.05%)
helped: 15
HURT: 30
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
HURT stats (abs)   min: 8 max: 10 x̄: 9.00 x̃: 9
HURT stats (rel)   min: 0.06% max: 0.14% x̄: 0.10% x̃: 0.10%
95% mean confidence interval for cycles value: 4.21 7.12
95% mean confidence interval for cycles %-change: 0.05% 0.08%
Cycles are HURT.

Sandy Bridge
total cycles in shared programs: 154568664 -> 154569225 (<.01%)
cycles in affected programs: 356486 -> 357047 (0.16%)
helped: 1
HURT: 31
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.02% max: 0.02% x̄: 0.02% x̃: 0.02%
HURT stats (abs)   min: 4 max: 33 x̄: 18.16 x̃: 8
HURT stats (rel)   min: 0.05% max: 0.23% x̄: 0.14% x̃: 0.10%
95% mean confidence interval for cycles value: 12.19 22.87
95% mean confidence interval for cycles %-change: 0.10% 0.16%
Cycles are HURT.

Iron Lake
total instructions in shared programs: 8206589 -> 8206565 (<.01%)
instructions in affected programs: 3024 -> 3000 (-0.79%)
helped: 12
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.75% max: 0.83% x̄: 0.80% x̃: 0.80%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.82% -0.77%
Instructions are helped.

total cycles in shared programs: 187657428 -> 187656228 (<.01%)
cycles in affected programs: 95748 -> 94548 (-1.25%)
helped: 12
HURT: 0
helped stats (abs) min: 80 max: 120 x̄: 100.00 x̃: 100
helped stats (rel) min: 1.00% max: 1.66% x̄: 1.27% x̃: 1.21%
95% mean confidence interval for cycles value: -113.27 -86.73
95% mean confidence interval for cycles %-change: -1.43% -1.11%
Cycles are helped.

GM45
total instructions in shared programs: 5037569 -> 5037557 (<.01%)
instructions in affected programs: 1521 -> 1509 (-0.79%)
helped: 6
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.75% max: 0.83% x̄: 0.79% x̃: 0.79%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.83% -0.75%
Instructions are helped.

total cycles in shared programs: 128101478 -> 128100758 (<.01%)
cycles in affected programs: 52746 -> 52026 (-1.37%)
helped: 6
HURT: 0
helped stats (abs) min: 120 max: 120 x̄: 120.00 x̃: 120
helped stats (rel) min: 1.16% max: 1.66% x̄: 1.41% x̃: 1.41%
95% mean confidence interval for cycles value: -120.00 -120.00
95% mean confidence interval for cycles %-change: -1.70% -1.12%
Cycles are helped.

This change has almost no effect right now.  However, removing this
patch (but leaving the patch "nir/algebraic: Replace a bcsel of a b2f
with a b2f(!(a || b))") after adding a patch that removes !(a < b) -> (a
>= b) optimizations (like
https://patchwork.freedesktop.org/patch/264787/) has the following
results on Skylake:

Skylake
total instructions in shared programs: 15071022 -> 15089710 (0.12%)
instructions in affected programs: 1022219 -> 1040907 (1.83%)
helped: 1
HURT: 3937
helped stats (abs) min: 41 max: 41 x̄: 41.00 x̃: 41
helped stats (rel) min: 1.01% max: 1.01% x̄: 1.01% x̃: 1.01%
HURT stats (abs)   min: 1 max: 256 x̄: 4.76 x̃: 4
HURT stats (rel)   min: 0.05% max: 11.18% x̄: 2.59% x̃: 2.60%
95% mean confidence interval for instructions value: 4.56 4.93
95% mean confidence interval for instructions %-change: 2.54% 2.64%
Instructions are HURT.

total cycles in shared programs: 369777134 -> 370092923 (0.09%)
cycles in affected programs: 17516573 -> 17832362 (1.80%)
helped: 115
HURT: 3624
helped stats (abs) min: 1 max: 1721 x̄: 81.18 x̃: 28
helped stats (rel) min: <.01% max: 10.74% x̄: 1.24% x̃: 0.65%
HURT stats (abs)   min: 1 max: 12640 x̄: 89.71 x̃: 54
HURT stats (rel)   min: <.01% max: 28.24% x̄: 4.72% x̃: 4.52%
95% mean confidence interval for cycles value: 75.21 93.71
95% mean confidence interval for cycles %-change: 4.43% 4.64%
Cycles are HURT.

total spills in shared programs: 9450 -> 9442 (-0.08%)
spills in affected programs: 166 -> 158 (-4.82%)
helped: 2
HURT: 0

total fills in shared programs: 21115 -> 21094 (-0.10%)
fills in affected programs: 438 -> 417 (-4.79%)
helped: 2
HURT: 0

LOST:   1
GAINED: 0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick 7725d60938 intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))
Since Boolean values are either -1 (true) or 0 (false), b2f(inot(a))
maps -1 => 0.0 and 0 => 1.0.  This is equivalent to 1.0 +
float(boolBitsToInt(a)).  On Intel GPUs, ADD is one of the few
instructions that can type-convert during write to destination, so we
can achieve this in a single instruction:

    add    g47F, g26D, 1D

v2: Fix swizzles.

v3: Fix typos in comments.  Noticed by Ken.

All Gen6+ platforms had similar results. (Skylake shown)
Skylake
total instructions in shared programs: 15185583 -> 15184683 (<.01%)
instructions in affected programs: 239389 -> 238489 (-0.38%)
helped: 899
HURT: 1
helped stats (abs) min: 1 max: 2 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.15% max: 1.85% x̄: 0.49% x̃: 0.44%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.09% max: 0.09% x̄: 0.09% x̃: 0.09%
95% mean confidence interval for instructions value: -1.01 -0.99
95% mean confidence interval for instructions %-change: -0.51% -0.48%
Instructions are helped.

total cycles in shared programs: 370964249 -> 370961508 (<.01%)
cycles in affected programs: 1487586 -> 1484845 (-0.18%)
helped: 420
HURT: 268
helped stats (abs) min: 1 max: 232 x̄: 22.41 x̃: 6
helped stats (rel) min: 0.05% max: 22.60% x̄: 1.30% x̃: 0.41%
HURT stats (abs)   min: 1 max: 230 x̄: 24.90 x̃: 10
HURT stats (rel)   min: <.01% max: 21.60% x̄: 1.45% x̃: 0.52%
95% mean confidence interval for cycles value: -7.61 -0.36
95% mean confidence interval for cycles %-change: -0.44% -0.02%
Cycles are helped.

No changes on Iron Lake or GM45.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick cb3e21cd19 intel/fs: Use De Morgan's laws to avoid logical-not of a logic result on Gen8+
Instead of emitting ~(a & b), emit (~a | ~b) since logical-not of
operands is free on Gen8+.

v2: Fix swizzles.  Fix types for cmod propagation.

v3: Simplify logic for inverting source of inot(ixor(a, b)).  Suggested
by Ken.

Skylake and Broadwell had similar results. (Skylake shown)
Skylake
total instructions in shared programs: 15185593 -> 15185583 (<.01%)
instructions in affected programs: 5673 -> 5663 (-0.18%)
helped: 12
HURT: 1
helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1
helped stats (rel) min: 0.30% max: 5.88% x̄: 1.50% x̃: 0.70%
HURT stats (abs)   min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel)   min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%
95% mean confidence interval for instructions value: -1.66 0.13
95% mean confidence interval for instructions %-change: -2.60% -0.15%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 370977726 -> 370964249 (<.01%)
cycles in affected programs: 869987 -> 856510 (-1.55%)
helped: 15
HURT: 2
helped stats (abs) min: 2 max: 6640 x̄: 902.20 x̃: 16
helped stats (rel) min: <.01% max: 4.92% x̄: 1.71% x̃: 1.53%
HURT stats (abs)   min: 14 max: 42 x̄: 28.00 x̃: 28
HURT stats (rel)   min: 1.08% max: 3.18% x̄: 2.13% x̃: 2.13%
95% mean confidence interval for cycles value: -1654.87 69.34
95% mean confidence interval for cycles %-change: -2.29% -0.23%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick 8eb36c9129 intel/fs: Emit logical-not of operands on Gen8+
On Gen8+ specifying negation of a logical operation such as AND actually
performs a logical-not.  Take advantage of this to generate fewer
instructions.

v2: Major rebase.  Use nir_src_as_alu_instr.  Fix swizzle handling.

No changes on any pre-Gen8 platform.

Skylake and Broadwell had similar results. (Broadwell shown)
total instructions in shared programs: 15466902 -> 15466274 (<.01%)
instructions in affected programs: 1262953 -> 1262325 (-0.05%)
helped: 682
HURT: 4
helped stats (abs) min: 1 max: 5 x̄: 1.02 x̃: 1
helped stats (rel) min: 0.03% max: 2.40% x̄: 0.18% x̃: 0.04%
HURT stats (abs)   min: 1 max: 62 x̄: 17.50 x̃: 3
HURT stats (rel)   min: 0.03% max: 1.89% x̄: 0.53% x̃: 0.10%
95% mean confidence interval for instructions value: -1.10 -0.73
95% mean confidence interval for instructions %-change: -0.19% -0.15%
Instructions are helped.

total cycles in shared programs: 410996093 -> 410950440 (-0.01%)
cycles in affected programs: 144389048 -> 144343395 (-0.03%)
helped: 519
HURT: 51
helped stats (abs) min: 1 max: 1060 x̄: 104.46 x̃: 140
helped stats (rel) min: 0.01% max: 10.98% x̄: 0.34% x̃: 0.03%
HURT stats (abs)   min: 1 max: 4060 x̄: 167.90 x̃: 22
HURT stats (rel)   min: <.01% max: 8.20% x̄: 0.96% x̃: 0.25%
95% mean confidence interval for cycles value: -97.16 -63.02
95% mean confidence interval for cycles %-change: -0.32% -0.13%
Cycles are helped.

total spills in shared programs: 95311 -> 95329 (0.02%)
spills in affected programs: 881 -> 899 (2.04%)
helped: 0
HURT: 4

total fills in shared programs: 93629 -> 93634 (<.01%)
fills in affected programs: 794 -> 799 (0.63%)
helped: 1
HURT: 2

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick 06eaaf2de9 intel/fs: Refactor ALU source and destination handling to a separate function
Other places will need to do this soon to properly handle source
swizzles.  The patch looks a little odd, but the change is pretty
straight forward.  All of the swizzle and mask handling is moved out,
but the code for handling move instructions and vecN instructions
remains in nir_emit_alu.

I'm not terribly pleased with the "need_dest" parameter, but
get_nir_dest is (somewhat surprisingly) destructive.  I am open to
suggestions of alternatives.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick fb3ca9109c intel/fs: Handle OR source modifiers in algebraic optimization
Found by inspection.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick c9d5bd050c intel/fs: Relax type matching rules in cmod propagation from MOV instructions
To allow cmod propagation from a MOV in a sequence like:

    and(16)         g31<1>UD       g20<8,8,1>UD   g22<8,8,1>UD
    mov.nz.f0(16)   null<1>F       g31<8,8,1>D

A similar change to the vec4 backend had no effect.

Somewhere between c1ec582059 and 40fc4b5acd (1,094 commits) the
effectiveness of this patch diminished, and as of commit d7e0d47b9d
(nir: Add a bunch of b2[if] optimizations) this optimization no longer
has any effect on any platform.

A later patch "intel/fs: Use De Morgan's laws to avoid logical-not of a
logic result on Gen8+," generates some instruction sequences that
require this change in order for cmod propagation to make progress.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick d2056ab993 intel/vec4: Emit constants for some ALU sources as immediate values
In some cases of flow control, the constant propagation is not able to
determine that the source of an instruction must be a constant value.
When we still have NIR SSA values, we can easily determine this.  Emit
the immediate value during code generation to possible avoid spurious
loads of constants into registers.

I wrote this patch to prevent a couple trivial regressions in vec4
shaders caused by "nir/algebraic: Replace i2b used by bcsel or
if-statement with comparison".  The final result was quite a bit better
than that...

No shader-db changes on any Gen8+ platform.

v2: Assert that we never get a negation source modifier on Gen8+.
Suggested by Ken.  This should never happen because we don't normally
use vec4 for Gen8+ (requires and environment variable to force it), and
there's no code to generate these negations.  Still, erring on the side
of caution is better.

Haswell
total instructions in shared programs: 13776218 -> 13764783 (-0.08%)
instructions in affected programs: 663931 -> 652496 (-1.72%)
helped: 3495
HURT: 1
helped stats (abs) min: 1 max: 30 x̄: 3.28 x̃: 2
helped stats (rel) min: 0.21% max: 10.00% x̄: 1.79% x̃: 1.49%
HURT stats (abs)   min: 24 max: 24 x̄: 24.00 x̃: 24
HURT stats (rel)   min: 12.24% max: 12.24% x̄: 12.24% x̃: 12.24%
95% mean confidence interval for instructions value: -3.39 -3.15
95% mean confidence interval for instructions %-change: -1.84% -1.75%
Instructions are helped.

total cycles in shared programs: 386818984 -> 386511910 (-0.08%)
cycles in affected programs: 20379636 -> 20072562 (-1.51%)
helped: 3052
HURT: 476
helped stats (abs) min: 2 max: 12516 x̄: 110.40 x̃: 6
helped stats (rel) min: 0.05% max: 24.68% x̄: 1.58% x̃: 0.69%
HURT stats (abs)   min: 2 max: 416 x̄: 62.76 x̃: 24
HURT stats (rel)   min: 0.10% max: 10.75% x̄: 4.03% x̃: 2.18%
95% mean confidence interval for cycles value: -115.57 -58.51
95% mean confidence interval for cycles %-change: -0.93% -0.73%
Cycles are helped.

total spills in shared programs: 100482 -> 100480 (<.01%)
spills in affected programs: 79 -> 77 (-2.53%)
helped: 3
HURT: 1

total fills in shared programs: 96883 -> 96877 (<.01%)
fills in affected programs: 85 -> 79 (-7.06%)
helped: 4
HURT: 0

Ivy Bridge
total instructions in shared programs: 12000562 -> 11990113 (-0.09%)
instructions in affected programs: 572581 -> 562132 (-1.82%)
helped: 3106
HURT: 0
helped stats (abs) min: 1 max: 30 x̄: 3.36 x̃: 2
helped stats (rel) min: 0.21% max: 10.00% x̄: 1.86% x̃: 1.49%
95% mean confidence interval for instructions value: -3.49 -3.23
95% mean confidence interval for instructions %-change: -1.91% -1.81%
Instructions are helped.

total cycles in shared programs: 180958504 -> 180664500 (-0.16%)
cycles in affected programs: 19991810 -> 19697806 (-1.47%)
helped: 2654
HURT: 486
helped stats (abs) min: 2 max: 12516 x̄: 121.61 x̃: 6
helped stats (rel) min: 0.05% max: 20.66% x̄: 1.48% x̃: 0.68%
HURT stats (abs)   min: 2 max: 396 x̄: 59.18 x̃: 24
HURT stats (rel)   min: 0.05% max: 9.62% x̄: 3.82% x̃: 2.16%
95% mean confidence interval for cycles value: -125.62 -61.64
95% mean confidence interval for cycles %-change: -0.76% -0.56%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10842336 -> 10835438 (-0.06%)
instructions in affected programs: 395340 -> 388442 (-1.74%)
helped: 1926
HURT: 0
helped stats (abs) min: 1 max: 22 x̄: 3.58 x̃: 2
helped stats (rel) min: 0.10% max: 9.68% x̄: 1.78% x̃: 1.42%
95% mean confidence interval for instructions value: -3.73 -3.43
95% mean confidence interval for instructions %-change: -1.84% -1.72%
Instructions are helped.

total cycles in shared programs: 154590074 -> 154569050 (-0.01%)
cycles in affected programs: 8159932 -> 8138908 (-0.26%)
helped: 1670
HURT: 228
helped stats (abs) min: 2 max: 260 x̄: 18.13 x̃: 6
helped stats (rel) min: 0.02% max: 8.70% x̄: 0.74% x̃: 0.28%
HURT stats (abs)   min: 2 max: 1798 x̄: 40.58 x̃: 14
HURT stats (rel)   min: 0.03% max: 12.97% x̄: 1.04% x̃: 0.31%
95% mean confidence interval for cycles value: -13.51 -8.64
95% mean confidence interval for cycles %-change: -0.60% -0.46%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8212357 -> 8206587 (-0.07%)
instructions in affected programs: 323664 -> 317894 (-1.78%)
helped: 1457
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 3.96 x̃: 3
helped stats (rel) min: 0.33% max: 11.49% x̄: 1.86% x̃: 1.44%
95% mean confidence interval for instructions value: -4.14 -3.78
95% mean confidence interval for instructions %-change: -1.93% -1.78%
Instructions are helped.

total cycles in shared programs: 187668016 -> 187657422 (<.01%)
cycles in affected programs: 14856234 -> 14845640 (-0.07%)
helped: 1372
HURT: 83
helped stats (abs) min: 2 max: 24 x̄: 7.92 x̃: 6
helped stats (rel) min: 0.02% max: 1.14% x̄: 0.12% x̃: 0.08%
HURT stats (abs)   min: 2 max: 14 x̄: 3.20 x̃: 2
HURT stats (rel)   min: 0.03% max: 0.60% x̄: 0.12% x̃: 0.12%
95% mean confidence interval for cycles value: -7.65 -6.91
95% mean confidence interval for cycles %-change: -0.11% -0.10%
Cycles are helped.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:41:46 -08:00
Eric Engestrom 2793417ec6 anv: fix typo
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-01 11:20:28 +00:00
Eric Engestrom 258e463db5 anv: remove spaces around kwargs assignment
pylint complains:
> C0326: No space allowed around keyword argument assignment

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-01 11:20:28 +00:00
Eric Engestrom 7b704fd2fd anv: drop unused parameter
I'm guessing a previous version of this script used an index-based map
of entrypoints, but that's not the case anymore.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-01 11:20:28 +00:00
Eric Engestrom b503d4e458 anv: simplify chained comparison
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-01 11:20:28 +00:00
Jason Ekstrand e8f863e718 intel/compiler: Re-prefix non-logical surface opcodes with VEC4
The scalar back-end uses SHADER_OPCODE_SEND for all surface messages so
we no longer need the non-logical opcodes there.  Prefix them VEC4 so
it's clear that they're only used by the vec4 back-end.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Jason Ekstrand 95ae400abc intel/schedule_instructions: Move some comments
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Jason Ekstrand aeaba24fcb intel/compiler: Drop unused surface opcodes
The unused typed surface read/write support in the vec4 back-end has
been dropped and the fs back-end now uses SHADER_OPCODE_SEND for all
image and buffer ops.  There's no reason to keep these opcodes around
anymore.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Jason Ekstrand a04c737215 intel/fs: Get rid of the IMAGE_SIZE opcode
Since switching to SHADER_OPCODE_SEND for image operations, we no longer
need the non-logical opcode.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00