Commit Graph

10573 Commits

Author SHA1 Message Date
Jason Ekstrand 2ad9917e18 anv/blorp: Fix a comment as per Nanley's review feedback
This accidentally didn't make it into 62378c5e9e
2018-09-01 09:12:08 -05:00
Jason Ekstrand 62378c5e9e anv/blorp: Do more flushing around HiZ clears
We make the flush after a HiZ clear unconditional and add a flush/stall
before the clear as well.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107760
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-09-01 09:08:36 -05:00
Ian Romanick 82530ce1b5 i965/vec4: Clamp indirect tes input array reads with 0x0fffffff
Page 190 of "Volume 7: 3D Media GPGPU Engine (Haswell)" says the valid
range of the offset is [0, 0FFFFFFFh].

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2018-09-01 00:23:45 -07:00
Ian Romanick 75666605c9 i965/vec4: Correctly handle uniform sources in generate_tes_add_indirect_urb_offset
Fixes failure in the new piglit test
tes-patch-input-array-vec2-index-invalid-rd.shader_test.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2018-09-01 00:23:43 -07:00
Rodrigo Vivi e8c42ed4ab intel: Introducing Amber Lake platform
Amber Lake uses the same gen graphics as Kaby Lake, including a id
that were previously marked as reserved on Kaby Lake, but that
now is moved to AML page.

This follows the ids and approach used on kernel's commit
e364672477a1 ("drm/i915/aml: Introducing Amber Lake platform")

Reported-by: Timo Aaltonen <timo.aaltonen@canonical.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-31 13:57:52 -07:00
Rodrigo Vivi 886a048feb intel: aubinator: Adding missed platforms to the error message.
Many new platforms got added to gen_device_name_to_pci_device_id()
but the error message inside aubinator didn't reflected those
changes. So syncing on the same order to be sure that we are not
missing any now.

Cc: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-31 13:57:41 -07:00
Kenneth Graunke b147254d36 Revert "intel/tools/aubwrite: Always use physical addresses for traces."
This reverts commit f8cfc77660.

This appears to break intel_dump_gpu for Gen9 systems - I can load them
in the simulator, but nothing happens.  Reverting the patch makes the
simulator properly execute our commands and shaders again.
2018-08-30 14:36:28 -07:00
Jason Ekstrand a0f18f2142 intel/nir: Lowering image loads and stores trashes all metadata
This fixes the GL_ARB_fragment_shader_interlock piglit test on gen8
platforms where the lack of metadata dirtying was causing another pass
to accidentally delete a much needed loop.

https://bugs.freedesktop.org/show_bug.cgi?id=107745
Fixes: 37f7983bcc "intel/compiler: Do image load/store lowering..."
Jason Ekstrand <jason@jlekstrand.net> writes:
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-30 14:06:31 -05:00
Jason Ekstrand d8033d4083 intel/compiler: Remove surface_idx from brw_image_param
Now that the drivers are lowering to surface indices themselves, we no
longer need to push the surface index into the shader.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:03 -05:00
Jason Ekstrand 3cbc02e469 intel: Use TXS for image_size when we have a typed surface
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:03 -05:00
Jason Ekstrand 09f1de97a7 anv,i965: Lower away image derefs in the driver
Previously, the back-end compiler turn image access into magic uniform
reads and there was a complex contract between back-end compiler and
driver about setting up and filling out those params.  As of this
commit, both drivers now lower image_deref_load_param_intel intrinsics
to load_uniform intrinsics controlled by the driver and lower the other
image_deref_* intrinsics to image_* intrinsics which take an actual
binding table index.  There are still "magic" uniforms but they are now
added and controlled entirely by the driver and that contract no longer
spans components.

This also has the side-effect of making most image use compile-time
binding table indices.  Previously, all image access pulled the binding
table index from a uniform.  Part of the reason for this was that the
magic uniforms made it difficult to decouple binding table indices from
the uniforms and, since they are indexed completely differently
(especially in Vulkan), it was hard to pull them apart.  Now that the
driver is handling both, it's trivial to decouple the two and provide
actual binding table indices.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15166872 -> 15164293 (-0.02%)
    instructions in affected programs: 115834 -> 113255 (-2.23%)
    helped: 191
    HURT: 0

    total cycles in shared programs: 571311495 -> 571196465 (-0.02%)
    cycles in affected programs: 4757115 -> 4642085 (-2.42%)
    helped: 73
    HURT: 67

    total spills in shared programs: 10951 -> 10926 (-0.23%)
    spills in affected programs: 742 -> 717 (-3.37%)
    helped: 7
    HURT: 0

    total fills in shared programs: 22226 -> 22201 (-0.11%)
    fills in affected programs: 1146 -> 1121 (-2.18%)
    helped: 7
    HURT: 0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:03 -05:00
Jason Ekstrand 3942943819 nir: Use a bitfield for image access qualifiers
This commit expands the current memory access enum to contain the extra
two bits provided for images.  We choose to follow the SPIR-V convention
of NonReadable and NonWriteable because readonly implies that you *can*
read so readonly + writeonly doesn't make as much sense as NonReadable +
NonWriteable.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:02 -05:00
Jason Ekstrand 4289143899 intel/compiler: Use two components for 1D array image sizes
Having the array length component stored in .z was a small convenience
for the ISL image param filling code and an annoyance in the NIR
lowering code.  The only convenience of treating 1D arrays like 2D
arrays in the lowering code is in the address calculation code so let's
put all the complexity there as well.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:02 -05:00
Jason Ekstrand b1c414ef28 isl: Use the view array length for the image size
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:02 -05:00
Jason Ekstrand 37f7983bcc intel/compiler: Do image load/store lowering to NIR
This commit moves our storage image format conversion codegen into NIR
instead of doing it in the back-end.  This has the advantage of letting
us run it through NIR's optimizer which is pretty effective at shrinking
things down.  In the common case of rgba8, the number of instructions
emitted after NIR is done with it is half of what it was with the
lowering happening in the back-end.  On the downside, the back-end's
lowering is able to directly use predicates and the NIR lowering has to
use IFs.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15166910 -> 15166872 (<.01%)
    instructions in affected programs: 5895 -> 5857 (-0.64%)
    helped: 15
    HURT: 0

Clearly, we don't have that much image_load_store happening in the
shaders in shader-db....

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:02 -05:00
Jason Ekstrand f2d0a2b110 anv/pipeline: Remove dead image loads in lower_input_attacnments
Dead code will get rid of them eventually but it's better if they're
just gone so we guarantee they won't trip up later passes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:02 -05:00
Jason Ekstrand 152fdeddbb nir/format_convert: Rename nir_format_bitcast_uint_vec
We have a name for that, it's called a uvec.  This just makes the
function name a bit shorter.  While we're here, we also add an assert
for one of the assumptions this function makes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:02 -05:00
Sagar Ghuge 40fc4b5acd intel/tools: new i965_disasm tool
Adds a new i965 instruction disassemble tool

v2: 1) fix a few nits (Matt Turner)
    2) Remove i965_disasm header (Matt Turner)

v3: 1) Redirect output to correct file descriptors (Matt Turner)
    2) Refactor code (Matt Turner)
    3) Use better formatting style (Matt Turner)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
2018-08-29 11:19:55 -07:00
Jason Ekstrand cdea5d996e anv: Free the app and engine name
Fixes: 8c048af589 "anv: Copy the appliation info into the instance"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-29 11:24:57 -05:00
Lionel Landwerlin 5a1c23d150 anv: blorp: support multiple aspect blits
Newer blit tests are enabling depth&stencils blits. We currently don't
support it but can do by iterating over the aspects masks (copy some
logic from the CopyImage function).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9f44745eca ("anv: Use blorp to implement VkBlitImage")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-29 10:31:06 +01:00
Ian Romanick c836326a29 i965/vec4: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1
No shader-db changes on any Intel platform.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-28 15:35:50 -07:00
Ian Romanick c856403868 i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for imageAtomicAdd of +1 or -1
v2: Refactor selection of atomic opcode to a separate function.
Suggested by Jason.

No changes on any other Intel platforms.

Skylake
total instructions in shared programs: 14304261 -> 14304241 (<.01%)
instructions in affected programs: 1625 -> 1605 (-1.23%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5
helped stats (rel) min: 1.01% max: 14.29% x̄: 5.86% x̃: 4.07%
95% mean confidence interval for instructions value: -10.66 0.66
95% mean confidence interval for instructions %-change: -15.91% 4.19%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 527531226 -> 527531194 (<.01%)
cycles in affected programs: 92204 -> 92172 (-0.03%)
helped: 2
HURT: 0

Haswell and Broadwell had similar results. (Broadwell shown)
total instructions in shared programs: 14615730 -> 14615710 (<.01%)
instructions in affected programs: 1838 -> 1818 (-1.09%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5
helped stats (rel) min: 0.89% max: 13.04% x̄: 5.37% x̃: 3.78%
95% mean confidence interval for instructions value: -10.66 0.66
95% mean confidence interval for instructions %-change: -14.59% 3.85%
Inconclusive result (value mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-28 15:35:46 -07:00
Ian Romanick b6e247cf0e i965/fs: Refactor image atomics to be a bit more like other atomics
This greatly simplifies the next patch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-28 15:35:46 -07:00
Ian Romanick fabe3ead57 i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1
Funny story... a single shader was hurt for instructions, spills, fills.
That same shader was also the most helped for cycles.  #GPUsAreWeird

No changes on any other Intel platform.

v2: Refactor selection of atomic opcode to a separate function.
Suggested by Jason.

Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14304116 -> 14304261 (<.01%)
instructions in affected programs: 12776 -> 12921 (1.13%)
helped: 19
HURT: 1
helped stats (abs) min: 1 max: 16 x̄: 2.32 x̃: 1
helped stats (rel) min: 0.05% max: 7.27% x̄: 0.92% x̃: 0.55%
HURT stats (abs)   min: 189 max: 189 x̄: 189.00 x̃: 189
HURT stats (rel)   min: 4.87% max: 4.87% x̄: 4.87% x̃: 4.87%
95% mean confidence interval for instructions value: -12.83 27.33
95% mean confidence interval for instructions %-change: -1.57% 0.31%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 527552861 -> 527531226 (<.01%)
cycles in affected programs: 1459195 -> 1437560 (-1.48%)
helped: 16
HURT: 2
helped stats (abs) min: 2 max: 21328 x̄: 1353.69 x̃: 6
helped stats (rel) min: 0.01% max: 5.29% x̄: 0.36% x̃: 0.03%
HURT stats (abs)   min: 12 max: 12 x̄: 12.00 x̃: 12
HURT stats (rel)   min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -3699.81 1295.92
95% mean confidence interval for cycles %-change: -0.94% 0.30%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 8025 -> 8033 (0.10%)
spills in affected programs: 208 -> 216 (3.85%)
helped: 1
HURT: 1

total fills in shared programs: 10989 -> 11040 (0.46%)
fills in affected programs: 444 -> 495 (11.49%)
helped: 1
HURT: 1

Ivy Bridge
total instructions in shared programs: 11709181 -> 11709153 (<.01%)
instructions in affected programs: 3505 -> 3477 (-0.80%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 23 x̄: 9.33 x̃: 4
helped stats (rel) min: 0.11% max: 1.16% x̄: 0.63% x̃: 0.61%

total cycles in shared programs: 254741126 -> 254738801 (<.01%)
cycles in affected programs: 919067 -> 916742 (-0.25%)
helped: 3
HURT: 0
helped stats (abs) min: 21 max: 2144 x̄: 775.00 x̃: 160
helped stats (rel) min: 0.03% max: 0.90% x̄: 0.32% x̃: 0.03%

total spills in shared programs: 4536 -> 4533 (-0.07%)
spills in affected programs: 40 -> 37 (-7.50%)
helped: 1
HURT: 0

total fills in shared programs: 4819 -> 4813 (-0.12%)
fills in affected programs: 94 -> 88 (-6.38%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-28 15:35:38 -07:00
Ian Romanick 41399f4bc7 intel/compiler: Silence unused parameter warnings in brw_eu.h
All of the other brw_*_desc functions take a devinfo parameter, and all
of the others at least have an assert that uses it.  Keep the parameter,
but mark it as unused.

Silences 37 warnings like:

In file included from src/intel/common/gen_disasm.c:27:0:
src/intel/compiler/brw_eu.h: In function ‘brw_pixel_interp_desc’:
src/intel/compiler/brw_eu.h:377:53: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_pixel_interp_desc(const struct gen_device_info *devinfo,
                                                     ^~~~~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-28 15:35:38 -07:00
Jason Ekstrand c92a463d23 anv: Claim to support depthBounds for ID games
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-28 13:05:54 -05:00
Jason Ekstrand 8c048af589 anv: Copy the appliation info into the instance
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-28 13:05:54 -05:00
Kevin Rogovin 03ecec9ed2 i965: Add INTEL_fragment_shader_ordering support.
Adds suppport for INTEL_fragment_shader_ordering. We achieve
the fragment ordering by using the same instruction as for
beginInvocationInterlockARB() which is by issuing a memory
fence via sendc.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2018-08-28 17:15:10 +03:00
Sagar Ghuge a1e3305f75 intel/eu: print bytes instead of 32 bit hex value
INTEL_DEBUG=hex prints 32 bit hex value and due to endianness of CPU
byte order is reversed. In order to disassemble binary files, print
each byte instead of 32 bit hex value.

v2: Print blank spaces in order to vertically align output of compacted
    instructions hex value with uncompacted instructions hex value.
    (Matt Turner)

v3: Fix line wrap at correct length

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-08-27 11:07:39 -07:00
Lionel Landwerlin 440a988bd1 intel: decoder: handle 0 sized structs
Gen7.5 has a BLEND_STATE of size 0 which includes a variable length
group. We did not deal with that very well, leading to an endless
loop.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107544
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-27 18:33:18 +01:00
Samuel Iglesias Gonsálvez 59a8e0dbf8 anv: Add support for protected memory properties on anv_GetPhysicalDeviceProperties2()
VkPhysicalDeviceProtectedMemoryProperties structure is new on Vulkan 1.1.

Fixes Vulkan CTS CL#2849.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-27 09:07:52 +02:00
Jason Ekstrand aad501f15e intel/tools: Add 0x in front of a couple of hex values
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-25 18:47:08 -05:00
Jason Ekstrand 76b0e4d8c9 anv: Fill holes in the VF VUE to zero
This fixes a GPU hang in DOOM 2016 running under wine.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104809
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-25 18:47:08 -05:00
Kai Wasserbäch b2313ef4a8 intel: tools: Fix aubinator_error's fprintf call (format-security)
The recent commit 4616639b49 introduced
the new function aubinator_error() which is a trivial wrapper around
fprintf() to STDERR. The call to fprintf() however is passed the message
msg directly:
  fprintf(stderr, msg);

This is a format-security violation and leads to an FTBFS with
-Werror=format-security (GCC 8):
  ../../../src/intel/tools/aubinator.c: In function 'aubinator_error':
  ../../../src/intel/tools/aubinator.c:74:4: error: format not a string literal and no format arguments [-Werror=format-security]
      fprintf(stderr, msg);
      ^~~~~~~

This patch fixes this trivially by introducing a catch-all "%s" format
argument.

Fixes: 4616639b49 ("intel: tools: split aub parsing from aubinator")
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-25 16:52:12 +01:00
Jason Ekstrand 70de31d0c1 intel/batch_decoder: Print blend states properly
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-25 07:50:45 -05:00
Jason Ekstrand cbd4bc1346 intel/batch_decoder: Fix dynamic state printing
Instead of printing addresses like everyone else, we were accidentally
printing the offset from state base address.  Also, state_map is a void
pointer so we were incrementing in bytes instead of dwords and every
state other than the first was wrong.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-25 07:50:43 -05:00
Jason Ekstrand d1971be6ea intel/decoder: Print ISL formats for vertex elements
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-25 07:50:40 -05:00
Jason Ekstrand 2abd7ae189 intel/decoder: Clean up field iteration and fix sub-dword fields
First of all, setting iter->name in advance_field is unnecessary because
it gets set by gen_decode_field which gets called immediately after
gen_decode_field in the one call-site.  Second, we weren't properly
initializing start_bit and end_bit in the initial condition of
gen_field_iterator_next so the first field of a struct would get printed
wrong if it doesn't start on the first bit.  This is fixed by adding a
iter_start_field helper which sets the field and also sets up the other
bits we need.  This fixes decoding of 3DSTATE_SBE_SWIZ.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-25 07:50:36 -05:00
Lionel Landwerlin f430a37fa7 intel: decoder: unify MI_BB_START field naming
The batch decoder looks for a field with a particular name to decide
whether an MI_BB_START leads into a second batch buffer level. Because
the names are different between Gen7.5/8 and the newer generation we
fail that test and keep on reading (invalid) instructions.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107544
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-24 23:10:08 +01:00
Emil Velikov cff80b6c15 Revert "configure: allow building with python3"
This reverts commit ae7898dfdb.

Turns out the python scripts are _not_ fully python 3 compatible.
As Ilia reported using get_xmlpool.py with LANG=C produces some weird
output - see the link for details.

Even though the issue was spotted with the autoconf build, it exposes a
genuine problem with the script (and lack of lang handling of the meson
build.)

https://lists.freedesktop.org/archives/mesa-dev/2018-August/203508.html
2018-08-24 11:14:15 +01:00
Jason Ekstrand 8d8222461f intel/nir: Enable nir_opt_find_array_copies
We have to be a bit careful with this one because we want it to run in
the optimization loop but only in the first brw_nir_optimize call.
Later calls assume that we've lowered away copy_deref instructions and
we don't want to introduce any more.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15176942 -> 15176942 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

In spite of the lack of any shader-db improvement, this patch completely
eliminates spilling in the Batman: Arkham City tessellation shaders.
This is because we are now able to detect that the temporary array
created by DXVK for storing TCS inputs is a copy of the input arrays and
use indirect URB reads instead of making a copy of 4.5 KiB of input data
and then indirecting on it with if-ladders.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-23 21:47:51 -05:00
Jason Ekstrand a4a9c07549 intel/nir: Use nir_shrink_vec_array_vars
Shader-db results on Kaby Lake:

    total instructions in shared programs: 15177605 -> 15176765 (<.01%)
    instructions in affected programs: 4259 -> 3419 (-19.72%)
    helped: 1
    HURT: 0

    total spills in shared programs: 10954 -> 10855 (-0.90%)
    spills in affected programs: 295 -> 196 (-33.56%)
    helped: 1
    HURT: 0

    total fills in shared programs: 22222 -> 22117 (-0.47%)
    fills in affected programs: 417 -> 312 (-25.18%)
    helped: 1
    HURT: 0

The helped shader is from the OglCSDof synmark test.  On my Kaby Lake
laptop, the actual framerate of the benchmark didn't appear to improve
beyond the noise.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-23 21:46:56 -05:00
Jason Ekstrand 02a5442dd7 intel/nir: Use the new structure and array splitting passes
We call structure splitting once because it is guaranteed to split all
the structures in the entire shader in one go.  We call array splitting
in the loop in case future optimizations turn indirects into direct
dereferences and we can split more arrays.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15177605 -> 15177605 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

This is unsurprising because nir_lower_vars_to_ssa already effectively
does structure and array splitting internally.  It doesn't actually
split the variables but it's ability to reason about aliasing in the
presence of arrays and structures and pick out scalars or vectors to be
lowered to SSA values is fairly advanced.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-23 21:44:14 -05:00
Kenneth Graunke 578e45ab7b intel/decoder: Decode SFIXED values.
This lets us example SAMPLER_STATE's LOD Bias field, among other things.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-23 13:04:53 -07:00
Emil Velikov ae7898dfdb configure: allow building with python3
Pretty much all of the scripts are python2+3 compatible.
Check and allow using python3, while adjusting the PYTHON2 refs.

Note:
 - python3.4 is used as it's the earliest supported version
 - python3 chosen prior to python2

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2018-08-23 17:00:13 +01:00
Ian Romanick d515c75463 intel/compiler: Implement untyped atomic float min, max, and compare-swap dataport messages
v2: Split changes to the message type field to another patch.  Suggested
by Caio.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-22 20:31:32 -07:00
Ian Romanick f347348f8a intel/compiler: Expand untyped atomic message type field by a bit
This is necessary for a new Gen9 message type that will be added in the
next patch.  There are also Gen8 message types that need the extra bit
(mostly for bindless).

v2: Split off from the next patch.  Suggested by Caio.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-22 20:31:32 -07:00
Ian Romanick d628642a34 intel/compiler: Silence unused parameter warnings
src/intel/compiler/brw_disasm_info.c: In function ‘nir_print_instr’:
src/intel/compiler/brw_disasm_info.c:30:61: warning: unused parameter ‘instr’ [-Wunused-parameter]
 __attribute__((weak)) void nir_print_instr(const nir_instr *instr, FILE *fp) {}
                                                             ^~~~~
src/intel/compiler/brw_disasm_info.c:30:74: warning: unused parameter ‘fp’ [-Wunused-parameter]
 __attribute__((weak)) void nir_print_instr(const nir_instr *instr, FILE *fp) {}
                                                                          ^~
src/intel/compiler/brw_disasm.c: In function ‘src_ia1’:
src/intel/compiler/brw_disasm.c:850:18: warning: unused parameter ‘_reg_file’ [-Wunused-parameter]
         unsigned _reg_file,
                  ^~~~~~~~~
src/intel/compiler/brw_fs_surface_builder.cpp: In function ‘void brw::surface_access::emit_byte_scattered_write(const brw::fs_builder&, const fs_reg&, const fs_reg&, const fs_reg&, unsigned int, unsigned int, unsigned int, brw_predicate)’:
src/intel/compiler/brw_fs_surface_builder.cpp:193:57: warning: unused parameter ‘size’ [-Wunused-parameter]
                                 unsigned dims, unsigned size,
                                                         ^~~~

v2: Update commit message.  brw_fs_generator.cpp warnings were already
fixed by another patch.  Noticed by Caio.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-22 20:31:32 -07:00
Nanley Chery 6d80b0b4ba intel/isl: Avoid tiling some 16K-wide render targets
Fix rendering issues on BDW and SKL.

Fixes: 0288fe8d04
("i965/miptree: Use the correct BLT pitch")

Fixes the following regressions seen

exclusively on SKL:
* KHR-GL46.texture_barrier_ARB.disjoint-texels
* KHR-GL46.texture_barrier_ARB.overlapping-texels
* KHR-GL46.texture_barrier.disjoint-texels
* KHR-GL46.texture_barrier.overlapping-texels

and both on BDW and SKL:
* GTF-GL46.gtf21.GL2FixedTests.buffer_corners.buffer_corners
* GTF-GL46.gtf21.GL2FixedTests.stencil_plane_corners.stencil_plane_corners

v2: Note the fixed tests (Andres).
    Don't cause failures with multisampled buffers (Andres).
    Don't hamper SKL GT4 (Ken).
v3: Fix the Fixes tag (Dylan).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107359
Cc: <mesa-stable@lists.freedesktop.org>
Tested-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-22 13:53:19 -07:00
Rafael Antognolli f8cfc77660 intel/tools/aubwrite: Always use physical addresses for traces.
It looks like we can't rely on the simulator to always translate virtual
addresses to physical ones correctly. So let's use physical everywhere.

Since our current GGTT maps virtual to physical addresses in a 1:1 way,
no further changes are required.

Additionally, we have other address spaces not in use right now. So
let's make it easier to switch which one we are using but putting the
default one into the aub_file struct.

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-22 12:52:41 -07:00