Commit Graph

3344 Commits

Author SHA1 Message Date
Kenneth Graunke de57926dc9 blorp: Properly handle Z24X8 blits.
One of the reasons we didn't notice that R24_UNORM_X8_TYPELESS
destinations were broken was that an earlier layer was swapping it
out for B8G8R8A8_UNORM.  That made Z24X8 -> Z24X8 blits work.

However, R32_FLOAT -> R24_UNORM_X8_TYPELESS was still totally broken.
The old code only considered one format at a time, without thinking
that format conversion may need to occur.

This patch moves the translation out to a place where it can consider
both formats.  If both are Z24X8, we continue using B8G8R8A8_UNORM to
avoid having to do shader math workarounds.  If we have a Z24X8
destination, but a non-matching source, we use our shader hacks to
actually render to it properly.

Fixes: 804856fa57 (intel/blorp: Handle more exotic destination formats)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-11 12:34:01 -07:00
Kenneth Graunke 8a29086285 blorp: Don't try to use R32_UNORM for R24_UNORM_X8_TYPELESS rendering.
The hardware doesn't support rendering to R24_UNORM_X8_TYPELESS, so
Jason decided to fake it with a bit of shader math and R32_UNORM RTs.

The only problem is that R32_UNORM isn't renderable either...so we've
just traded one bad format for another.

This patch makes us use R32_UINT instead.

Fixes: 804856fa57 (intel/blorp: Handle more exotic destination formats)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-11 12:33:27 -07:00
Jason Ekstrand a9f7bcfdf9 intel: Switch the order of the 2x MSAA sample positions
The Vulkan 1.1.82 spec flipped the order to better match D3D.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2018-08-11 10:58:12 -05:00
Mathieu Bridon 2ee1c86d71 meson: Build with Python 3
Now that all the build scripts are compatible with both Python 2 and 3,
we can flip the switch and tell Meson to use the latter.

Since Meson already depends on Python 3 anyway, this means we don't need
two different Python stacks to build Mesa.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-08-10 15:15:09 -07:00
Kenneth Graunke 08a5c395ab intel: Fix SIMD16 unaligned payload GRF reads on Gen4-5.
When the SIMD16 Gen4-5 fragment shader payload contains source depth
(g2-3), destination stencil (g4), and destination depth (g5-6), the
single register of stencil makes the destination depth unaligned.

We were generating this instruction in the RT write payload setup:

   mov(16)   m14<1>F   g5<8,8,1>F   { align1 compr };

which is illegal, instructions with a source region spanning more than
one register need to be aligned to even registers.  This is because the
hardware implicitly does (nr | 1) instead of (nr + 1) when splitting the
compressed instruction into two mov(8)'s.

I believe this would cause the hardware to load g5 twice, replicating
subspan 0-1's destination depth to subspan 2-3.  This showed up as 2x2
artifact blocks in both TIS-100 and Reicast.

Normally, we rely on the register allocator to even-align our virtual
GRFs.  But we don't control the payload, so we need to lower SIMD widths
to make it work.  To fix this, we teach lower_simd_width about the
restriction, and then call it again after lower_load_payload (which is
what generates the offending MOV).

Fixes: 8aee87fe4c (i965: Use SIMD16 instead of SIMD8 on Gen4 when possible.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107212
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=13728
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Diego Viola <diego.viola@gmail.com>
2018-08-09 12:33:41 -07:00
Eric Engestrom fcf259ef97 anv: set error in all failure paths
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Fixes: 5b196f39bd "anv/pipeline: Compile to NIR in compile_graphics"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-09 11:20:27 +01:00
Eric Engestrom aac80f7597 intel/tools: add missing variable initialisation
Fixes: 6a60beba40 "intel/tools: Add an error state to aub translator"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-09 11:20:18 +01:00
Mathieu Bridon e1b88aee68 python: Fix rich comparisons
Python 3 doesn't call objects __cmp__() methods any more to compare
them. Instead, it requires implementing the rich comparison methods
explicitly: __eq__(), __ne(), __lt__(), __le__(), __gt__() and __ge__().

Fortunately Python 2 also supports those.

This commit only implements the comparison methods which are actually
used by the build scripts.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-08-07 13:10:34 -07:00
Lionel Landwerlin 303e7b39b5 intel: don't build tools without -Dtools=intel
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107487
Fixes: 4334196ab325c6w ("intel: tools: simplify meson build")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-08-07 11:58:47 +01:00
Tapani Pälli 5eb4b384d9 anv: add more swapchain formats
This change helps with some of the dEQP-VK.wsi.android.* tests that
try to create swapchain with using such formats.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2018-08-06 09:25:11 +03:00
Lionel Landwerlin 4334196ab3 intel: tools: simplify meson build
Remove the if tools condition and just put it through the install:
parameter.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-08-04 09:45:34 +01:00
Lionel Landwerlin 87a3c97781 intel: aubinator: simplify decoding
Since we don't support streaming an aub file, we can drop the decoding
status enum.

v2: include stdbool (Eric)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-08-04 09:40:14 +01:00
Lionel Landwerlin 02ebc064ea intel: common: add missing stdint include
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-08-04 09:39:01 +01:00
Lionel Landwerlin db4770ee57 intel: decoder: remove unused variable
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-08-04 09:38:58 +01:00
Lionel Landwerlin 7471286bb0 intel: tools: aubwrite: reuse canonical address helper
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-08-04 09:38:44 +01:00
Lionel Landwerlin 35955afa7a intel: aubinator: fix read the context/ring
Up to now we've been lucky that the buffer returned was always exactly
at the address we requested.

Fixes: 144b40db54 ("intel: aubinator: drop the 1Tb GTT mapping")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2018-08-04 09:38:34 +01:00
Jason Ekstrand 1d900e55fd anv/pipeline: Disable FS dispatch for pointless fragment shaders
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-03 05:52:23 -07:00
Andres Gomez 2d4d139877 intel/tools: add error2aub creation into autotools
Tarball distribution is done through "make distcheck". We include the
meson targets also into autotools so they won't fail when building
from the tarball.

Fixes: 6a60beba40 ("intel/tools: Add an error state to aub translator")
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Dylan Baker <dylan.c.baker@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-08-02 21:15:57 +03:00
Jason Ekstrand 7ef6cd0ee8 anv/pipeline: Do cross-stage linking optimizations
This appears to help the Aztec Ruins benchmark by about 2% on my Kaby
Lake gt2 laptop.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-02 10:29:20 -07:00
Jason Ekstrand a5bffa061d anv/pipeline: Pull most of the anv_pipeline_compile_* into common code
This leaves us with a series of little anv_pipeline_compile_* functions
which each take a compiler object, a mem_ctx, the stage to compile, and
the previous stage for VUE linking purposes.  Some of them do
interesting things but most are little more than wrappers around
brw_compile_*.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-02 10:29:20 -07:00
Jason Ekstrand 5351339554 anv/pipeline: Add a separate "link" stage
This breaks compilation up a bit into "link" and "compile".  In the
"link" stage, new anv_pipeline_link_* helpers are called which are
responsible for setting up the binding table and doing anything needed
to properly link with the next stage in the pipeline if one exists.
They are called in reverse order starting with the fragment shader so
you can assume linking in later stages is already done.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-02 10:29:20 -07:00
Jason Ekstrand 5b196f39bd anv/pipeline: Compile to NIR in compile_graphics
This pulls the SPIR-V to NIR step out into common code.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-02 10:29:20 -07:00
Jason Ekstrand 946fcd02a9 anv/pipeline: Recompile all shaders if any are missing from the cache
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-02 10:29:20 -07:00
Jason Ekstrand f76d6d8a63 anv/pipeline: Drop anv_pipeline_add_compiled_stage
We can set active_stages much more directly and then it's just candy
around setting pipeline->stages[stage].

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-02 10:29:20 -07:00
Jason Ekstrand 703a24932a anv/pipeline: Pull shader compilation out into a helper.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-02 10:29:20 -07:00
Jason Ekstrand f3c59ca947 anv/pipeline: Call anv_pipeline_compile_* in a loop
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-02 10:29:20 -07:00
Jason Ekstrand bdc3565c8c anv/pipeline: Hash the entire pipeline in one go
Instead of hashing each stage separately (and TES and TCS together), we
hash the entire pipeline.  This means we'll get fewer cache hits if
they, for instance, re-use the same VS over and over again but it also
means we can now safely do cross-stage optimizations.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-02 10:29:20 -07:00
Jason Ekstrand 4a8236ae17 anv/pipeline: Populate keys up-front
Instead of having each anv_pipeline_compile_* function populate the
shader key, make it part of the anv_pipeline_stage struct and fill it
out up-front.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-02 10:29:20 -07:00
Jason Ekstrand 76503b319a anv/pipline: Add a helper struct for per-stage info
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-02 10:29:20 -07:00
Jordan Justen 8fcdb71d8c intel/compiler: Add brw_get_compiler_config_value for disk cache
During code review, Jason pointed out that:

2b3064c073 "i965, anv: Use INTEL_DEBUG for disk_cache driver flags"

Didn't account for INTEL_SCALER_* environment variables.

To fix this, let the compiler return the disk_cache driver flags.

Another possible fix would be to pull the INTEL_SCALER_* into
INTEL_DEBUG bits, but as we are currently using 41 of 64 bits, I
didn't think it was a good use of 4 more of these bits. (5 since
INTEL_PRECISE_TRIG needs to be accounted for as well.)

Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-01 23:49:16 -07:00
Jordan Justen 3887700dfd i965: Disable shader cache with INTEL_DEBUG=shader_time
Shader time hard codes an index of the shader time buffer within the
gen program.

In order to support shader time in the disk shader cache, we'd need to
add the shader time index into the program key. This should work, but
probably is not worth it for this particular debug feature.

Therefore, let's just disable the disk shader cache if the shader time
debug feature is used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106382
Fixes: 96fe36f7ac "i965: Enable disk shader cache by default"
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-01 23:30:49 -07:00
Jason Ekstrand de9e5cf35a anv/pipeline: Add populate_tcs/tes_key helpers
They don't really do anything interesting, but it's more consistent this
way.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-01 18:02:28 -07:00
Jason Ekstrand e621f57556 anv/pipeline: Rework the parameters to populate_wm_prog_key
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-01 18:02:28 -07:00
Jason Ekstrand b2e0b0dad6 anv/pipeline: More aggressively optimize away color attachments
Instead of just looking at the number of color attachments, look at
which ones are actually used by the subpass.  This lets us potentially
throw away chunks of the fragment shader.  In DXVK, for example, all
subpasses have 8 attachments and most are VK_ATTACHMENT_UNUSED so this
is very helpful in that case.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-01 18:02:28 -07:00
Jason Ekstrand 80bc0b728c anv: Restrict the number of color regions to those actually written
The back-end compiler emits the number of color writes specified by
wm_prog_key::nr_color_regions regardless of what nir_store_outputs we
have.  Once we've gone through and figured out which render targets
actually exist and are written by the shader, we should restrict the key
to avoid extra RT write messages.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-01 18:02:28 -07:00
Jason Ekstrand 4d57e543b8 anv/pipeline: Fix up deref modes if we delete a FS output
With the new deref instructions, we have to keep the modes consistent
between the derefs and the variables they reference.  Since we remove
outputs by changing them to local variables, we need to run the fixup
pass to fix the modes.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-01 18:02:28 -07:00
Jason Ekstrand 4434591bf5 intel/nir: Call nir_lower_io_to_scalar_early
Shader-db results on Kaby Lake:

    total instructions in shared programs: 15166953 -> 15073611 (-0.62%)
    instructions in affected programs: 2390284 -> 2296942 (-3.91%)
    helped: 16469
    HURT: 505

    total loops in shared programs: 4954 -> 4951 (-0.06%)
    loops in affected programs: 3 -> 0
    helped: 3
    HURT: 0

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-01 18:02:28 -07:00
Jason Ekstrand b0bb547f78 intel/nir: Split IO arrays into elements
The NIR nir_lower_io_arrays_to_elements pass attempts to split I/O
variables which are arrays or matrices into a sequence of separate
variables.  This can help link-time optimization by allowing us to
remove varyings at a more granular level.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15177645 -> 15168494 (-0.06%)
    instructions in affected programs: 79857 -> 70706 (-11.46%)
    helped: 392
    HURT: 0

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-01 18:02:28 -07:00
Jason Ekstrand 57804efa88 i965/fs: Flag all slots of a flat input as flat
Otherwise, only the first vec4 of a matrix or other complex type will
get marked as flat and we'll interpolate the others.  This was caught by
a dEQP test which started failing because it did a SSO vs. non-SSO
comparison.  Previously, we did the interpolation wrong consistently in
both versions.  However, with one of Tim Arceri's NIR linkingpatches, we
started splitting the matrix input into vectors at link time in the
non-SSO version and it started getting correctly interpolated which
didn't match the broken SSO version.  As of this commit, they both get
correctly interpolated.

Fixes: e61cc87c75 "i965/fs: Add a flat_inputs field to prog_data"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-01 18:02:28 -07:00
Jason Ekstrand 4e060385e9 intel/nir: Use the correct scalar stage for consumers when linking
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-01 18:02:28 -07:00
Lionel Landwerlin 2477e516d9 intel: tools: aubwrite: split gen[89] from gen10+
Gen10+ has an additional bit in MI_BATCH_BUFFER_END to signal the end
of the context image.

We select the largest size for the context image regardless of the
generation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2018-08-01 15:31:56 +01:00
Mathieu Bridon a71df20855 python: Explicitly use byte strings
In both Python 2 and 3, zlib.Compress.compress() takes a byte string,
and returns a byte string as well.

In Python 2, the script was working because:

1. string literalls were byte strings;
2. opening a file in unicode mode, reading from it, then passing the
   unicode string to compress() would automatically encode to a byte
   string;

On Python 3, the above two points are not valid any more, so:

1. zlib.Compress.compress() refuses the passed unicode string;
2. compressed_data, defined as an empty unicode string literal, can't be
   concatenated with the byte string returned by compress();

This commit fixes this by explicitly using byte strings where
appropriate, so that the script works on both Python 2 and 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-08-01 14:26:19 +01:00
Mathieu Bridon c24d826968 python: Open file in binary mode
The XML parser wants byte strings, not unicode strings.

In both Python 2 and 3, opening a file without specifying the mode will
open it for reading in text mode ('r').

On Python 2, the read() method of the file object will return byte
strings, while on Python 3 it will return unicode strings.

Explicitly specifying the binary mode ('rb') makes the behaviour
identical in both Python 2 and 3, returning what the XML parser
expects.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-08-01 14:26:19 +01:00
Mathieu Bridon 12eb5b496b python: Better get character ordinals
In Python 2, iterating over a byte-string yields single-byte strings,
and we can pass them to ord() to get the corresponding integer.

In Python 3, iterating over a byte-string directly yields those
integers.

Transforming the byte string into a bytearray gives us a list of the
integers corresponding to each byte in the string, removing the need to
call ord().

This makes the script compatible with both Python 2 and 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-08-01 14:26:19 +01:00
Iago Toral Quiroga 471bce5689 intel/compiler: implement 8-bit constant load
Fixes VK-GL-CTS CL#2567

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-01 08:08:15 +02:00
Iago Toral Quiroga 7e6c8b0cb7 intel/compiler: add setup_imm_(u)b helpers
The hardware doesn't support byte immediates, so similar to setup_imm_df()
for doubles, these helpers work by loading the constant value into a
VGRF.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-01 08:08:15 +02:00
Topi Pohjolainen a5889d70f2 i965/icl: Disable binding table prefetching
Gen 11 workarounds table #2056 WABTPPrefetchDisable suggests to
disable prefetching of binding tables for ICLLP A0 and B0
steppings. It fixes multiple gpu hangs in
ext_framebuffer_multisample* tests on ICLLP B0 h/w.

Anuj: Add comments and commit message.
      Add gen 11 checks in the code.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2018-07-27 11:05:04 -07:00
Iago Toral Quiroga 615aaedb93 intel/compiler: fix lower conversions to account for predication
The pass can create a temporary result for the instruction and then
moves from it to the original destination, however, if the original
instruction was predicated, the mov has to be predicated as well.

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
2018-07-27 14:48:29 +02:00
Kenneth Graunke 488972222c i965: Combine both gl_PatchVerticesIn lowering passes.
Until now, we had separate passes for lowering gl_PatchVerticesIn to
a statically known constant (for TES inputs when linked against a TCS),
and a uniform in the other cases.  Annoyingly, one had to be run before
nir_lower_system_values, and the other afterward.  This simplified the
passes, but made life painful for the callers.

This patch combines both into a single pass.  If you give it a non-zero
static count, it uses that.  If you give it Mesa state slots, it turns
it back into a built-in uniform.  Otherwise, it does nothing.

This also moves the i965 uniform lowering out to shared code.

v2: Make token arrays const.

Reviewed-by: Eric Anholt <eric@anholt.net>
2018-07-26 21:51:36 -07:00
Kenneth Graunke 8794fe3e30 intel/compiler: Delete dead VS intrinsic handling.
These are lowered by brw_nir_lower_vs_inputs().  If they weren't, we
would have already hit the unreachable() in emit_system_values_block().

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-26 11:45:34 -07:00