Commit Graph

10573 Commits

Author SHA1 Message Date
Samuel Iglesias Gonsálvez dee31311eb Revert "i965/fs: Don't emit SEL instructions for type-converting MOVs."
This reverts commit 7dccd38b40.

d2x pass fixes SEL instructions when there is a type conversion
by doing a SEL without type conversion and then convert the result.
This pass also takes into account the non-uniform control flow.

Then, 7dccd38b40 is not needed anymore.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez aeecc82d05 i965/fs: generalize the legalization d2x pass
Generalize it to lower any unsupported narrower conversion.

v2 (Curro):
- Add supports_type_conversion()
- Reuse existing intruction instead of cloning it.
- Generalize d2x to narrower and equal size conversions.

v3 (Curro):
- Make supports_type_conversion() const and improve it.
- Use foreach_block_and_inst to process added instructions.
- Simplify code.
- Add assert and improve comments.
- Remove redundant mov.
- Remove useless comment.
- Remove saturate == false assert and add support for saturation
  when fixing the conversion.
- Add get_exec_type() function.

v4 (Curro):
- Use get_exec_type() function to get sources' type.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Matt Turner 94ffeb7fa2 i965: Use <0,2,1> region for scalar DF sources on IVB/BYT.
On HSW+, scalar DF sources can be accessed using the normal <0,1,0>
region, but on IVB and BYT DF regions must be programmed in terms of
floats. A <0,2,1> region accomplishes this.

v2:
- Apply region <0,2,1> in brw_reg_from_fs_reg() (Curro).

v3:
- Added comment explaining the reason (Curro).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez 82d17615f4 i965/fs: clamp exec_size when an instruction has a scalar DF source
Then the SIMD lowering pass will get rid of any compressed instructions with scalar
source (whether force_writemask_all or not) and we avoid hitting the Gen7 region
decompression bug.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero 0f1316d4db i965/fs: double regioning parameters and execsize for DF in IVB/BYT
In IVB and BYT, both regioning parameters and execution sizes are measured as
32-bits element size.

So when we have something like:

mov(8) g2<1>DF g3<4,4,1>DF

We are not actually moving 8 doubles (our intention), but 4 doubles.

We need to double the parameters to cope with this issue. However,
horizontal strides don't behave as they're supposed to on IVB
for DF regions, they will cause each 32-bit half of DF sources to be
strided individually, and doubling the value won't make any difference.

v2:
- Use devinfo directly (Matt).
- Use Baytrail instead of Valleview (Matt).
- Use IvyBridge instead of Ivy (Matt)
- Double the exec_size in code emission (Curro)

v3:
- Change hstride doubling by an assert and fix commit log (Curro).
- Substitute remaining compiler->devinfo by devinfo (Curro).

v4:
- Fix comment (Curro).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero 79af256388 i965/fs: add helper to retrieve instruction execution type
The execution data size is the biggest type size of any instruction
operand.

We will use it to know if the instruction deals with DF, because in Ivy
we need to double the execution size and regioning parameters.

v2:
- Fix typo in commit log (Matt)
- Use static inline function instead of fs_inst's method (Curro).
- Define the result as a constant (Curro).
- Fix indentation (Matt).
- Add braces to nested control flow (Matt).

v3 (Curro):
- Add get_exec_type() and other auxiliary functions and use them to
  calculate its size.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check.  Fix deduced
  execution type for integer vector types.  Take destination type as
  execution type where there is no valid source.  Assert-fail if the
  deduced execution type is byte.  Move into brw_ir_fs.h header for
  consistency with the VEC4 back-end. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Matt Turner fd349d29e4 i965: Handle IVB DF differences in the validator.
On IVB/BYT, region parameters and execution size for DF are in terms of
32-bit elements, so they are doubled. For evaluating the validity of an
instruction, we halve them.

v2 (Sam):
- Add comments.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-04-14 14:56:07 -07:00
Iago Toral Quiroga fbac8b1f94 i965/disasm: also print nibctrl in IVB for execsize=8
4-wide DF operations where NibCtrl applies require and execsize of 8
in IvyBridge/BayTrail.

v2:
- Refactor NibCtrl printing (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:06 -07:00
Jason Ekstrand 220974b38d anv/blorp: Properly handle VK_ATTACHMENT_UNUSED
The Vulkan driver was originally written under the assumption that
VK_ATTACHMENT_UNUSED was basically just for depth-stencil attachments.
However, the way things fell together, VK_ATTACHMENT_UNUSED can be used
anywhere in the subpass description.  The blorp-based clear and resolve
code has a bunch of places where we walk lists of attachments and we
weren't handling VK_ATTACHMENT_UNUSED everywhere.  This commit should
fix all of them.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
2017-04-14 14:20:42 -07:00
Jason Ekstrand 21d2ca72d8 anv/cmd_buffer: Use the null surface state for ATTACHMENT_UNUSED
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
2017-04-14 14:20:42 -07:00
Jason Ekstrand 02eca8b6f8 anv/cmd_buffer: Always set up a null surface state
We're about to start requiring it in yet another case and calculating
exactly when one is needed is starting to get prohibitively expensive.
A single surface state doesn't take up that much space so we may as well
create one all the time.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
2017-04-14 14:20:42 -07:00
Jason Ekstrand e1f6fb8021 anv/cmd_buffer: Flush the VF cache at the top of all primaries
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-14 13:35:02 -07:00
Jason Ekstrand 939337e49f anv/blorp: Flush the texture cache in UpdateBuffer
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-14 13:35:02 -07:00
Jason Ekstrand 475bab0330 anv: Limit VkDeviceMemory objects to 2GB
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-04-14 13:35:02 -07:00
Jason Ekstrand 4495b917e2 intel/blorp: Add a blorp_emit_dynamic macro
This makes it much easier to throw together a bit of dynamic state.  It
also automatically handles flushing so you don't accidentally forget.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-04-14 13:35:02 -07:00
Matt Turner ab18578b03 anv: Only define wsi_cbs when VK_USE_PLATFORM_WAYLAND_KHR defined 2017-04-12 11:00:39 -07:00
Francisco Jerez 147e71242c i965/fs: Take into account lower frequency of conditional blocks in spilling cost heuristic.
The individual branches of an if/else/endif construct will be executed
some unknown number of times between 0 and 1 relative to the parent
block.  Use some factor in between as weight while approximating the
cost of spill/fill instructions within a conditional if-else branch.
This favors spilling registers used within conditional branches which
are likely to be executed less frequently than registers used at the
top level.

Improves the framerate of the SynMark2 OglCSDof benchmark by ~1.9x on
my SKL GT4e.  Should have a comparable effect on other platforms.  No
significant regressions.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-11 15:28:54 -07:00
Juan A. Suarez Romero 8d7a82ae32 anv: remove needless VALGRIND_MAKE_MEM_DEFINED
This is already invoked in the following VG_NOACCESS_READ() call.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-11 17:21:57 +02:00
Jason Ekstrand da2ac19511 intel/blorp: Use ISL for emitting depth/stencil/hiz
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-10 07:57:21 -07:00
Jason Ekstrand d3785dcb2f intel/blorp: Emit 3DSTATE_STENCIL_BUFFER before HIER_DEPTH
We're about to replace blorp's emit code with ISL and it emits them in
the other order.  This makes diffing the aubs easier.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-10 07:57:21 -07:00
Jason Ekstrand f93dc5beee anv: Use ISL for emitting depth/stencil/hiz
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-10 07:57:21 -07:00
Jason Ekstrand bf95f7c209 intel/isl: Add support for emitting depth/stencil/hiz
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-10 07:57:21 -07:00
Jason Ekstrand 098ca9949d intel/isl: Use genx_bits.h instead of a hand-rolled table
This gets rid of one piece of ugliness with the way ISL handles surface
emitting surface states.  I've never liked that hand-rolled table but it
was the best we had at the time.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-07 22:34:04 -07:00
Jason Ekstrand b85d75b3e8 intel/genxml/bits: Emit per-container _length helpers
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-07 22:34:04 -07:00
Jason Ekstrand f97e251ab2 intel/genxml/bits: Emit per-field _start helpers
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-07 22:34:04 -07:00
Jason Ekstrand 430e697868 intel/genxml/bits: Pull the function emit code into a helper block
The helper block is extremely general.  It takes an string property name
and an object that supports three methods: has_prop, iter_prop, and
get_prop.  This way we can easily generalize it to emit more different
types of getter functions.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-07 22:34:04 -07:00
Jason Ekstrand 2d52e65d03 intel/genxml/bits: Refactor to add a container class
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-07 22:34:04 -07:00
Jason Ekstrand bc68aa42bd anv: Use subpass dependencies for flushes
Instead of figuring it all out ourselves, just use the information given
to us by the client.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand e5bbf8be36 anv/pass: Record required pipe flushes
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand 0039d0cf27 anv/pass: Use anv_multialloc for allocating the anv_pass
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand 415633a722 anv/descriptor_set: Use anv_multialloc for descriptor set layouts
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand e5c29b8c27 anv: Add a helper for doing mass allocations
We tend to try to reduce the number of allocation calls the Vulkan
driver uses by doing a single allocation whenever possible for a data
structure.  While this has certain downsides (usually code complexity),
it does mean error handling and cleanup is much easier.  This commit
adds a nice little helper struct for getting rid of some of that
complexity.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand 82695c32b6 anv: Add helpers for converting access flags to pipe bits
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand 4e17b59f6c anv/query: Use snooping on !LLC platforms
Commit b2c97bc789 which made us start
using a busy-wait for individual query results also messed up cache
flushing on !LLC platforms.  For one thing, I forgot the mfence after
the clflush so memory access wasn't properly getting fenced.  More
importantly, however, was that we were clflushing the whole query range
and then waiting for individual queries and then trying to read the
results without clflushing again.  Getting the clflushing both correct
and efficient is very subtle and painful.  Instead, let's side-step the
problem by just snooping.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-07 12:17:20 -07:00
Emil Velikov 5318d1ff94 anv: provide anv_gem_busy() stub for the tests
Otherwise linking way fail.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100600
Fixes: f195d40eca ("anv/device: Add a helper for querying whether a BO is busy")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2017-04-07 19:45:58 +01:00
Samuel Iglesias Gonsálvez 1c934bc71b anv/blorp: sample input attachments with resolves on BDW
On Broadwell we still need to do a resolve between the subpass
that writes and the subpass that reads when there is a
self-dependency because HW could not see fast-clears and works
on the render cache as if there was regular non-fast-clear surface.

Fixes 16 tests on BDW:

dEQP-VK.renderpass.formats.*.input.clear.store.self_dep*

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-07 07:49:43 +02:00
Jordan Justen 0370350d11 intel/aubinator: Stop searching after a custom handler is found
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-04-06 13:26:08 -07:00
Jordan Justen d5bd0e411e intel/gen_decoder: return -1 for unknown command formats
Decoding with aubinator encountered a command of 0xffffffff. With the
previous code, it caused aubinator to jump 255 + 2 dwords to start
decoding again.

Instead we can attempt to detect the known instruction formats. If the
format is not recognized, then we can advance just 1 dword.

v2:
 * Update aubinator_error_decode
 * Actually convert the length variable returned into a *signed* integer
   in aubinator.c, intel_batchbuffer.c and aubinator_error_decode.c.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-04-06 13:26:08 -07:00
Jordan Justen 7c33372f82 intel/gen_decoder: Fix length for Media State/Object commands
From BDW PRM, Volume 6: Command Stream Programming, 'Render Command
Header Format'.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-04-06 13:26:08 -07:00
Jordan Justen 3c77a57222 intel/aubinator_error_decode: Fix structure decode data
The call to gen_print_group should provide a pointer to the beginning
of the the structure data, not the start of the batch data.

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-04-06 13:25:38 -07:00
Jason Ekstrand b2c97bc789 anv/query: Busy-wait for available query entries
Before, we were just looking at whether or not the user wanted us to
wait and waiting on the BO.  Some clients, such as the Serious engine,
use a single query pool for hundreds of individual query results where
the writes for those queries may be split across several command
buffers.  In this scenario, the individual query we're looking for may
become available long before the BO is idle so waiting on the query pool
BO to be finished is wasteful. This commit makes us instead busy-loop on
each query until it's available.

This significantly reduces pipeline bubbles and improves performance of
The Talos Principle on medium settings (where the GPU isn't overloaded
with drawing) by around 20% on my SkyLake gt4.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
2017-04-05 21:17:11 -07:00
Jason Ekstrand f195d40eca anv/device: Add a helper for querying whether a BO is busy
This is a bit more efficient than using GEM_WAIT with a timeout of 0.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-05 21:17:11 -07:00
Emil Velikov a6840efc09 anv: provide required gem stubs for the tests
Introduce stubs to anv_gem_stub.c that match the anv_gem.c ones.
Otherwise we may get link-time errors, when building the tests.

v2: Introduce all the missing stubs at once.

Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Vinson Lee <vlee@freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100574
Fixes: c964f0e485 ("anv: Query the kernel for reset status")
Fixes: 651ec926fc ("anv: Add support for 48-bit addresses")
Fixes: 060a6434ec ("anv: Advertise larger heap sizes")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
---
I've intentionally kept the order the same identical to the anv_gem.c.
This way we can easily grep & diff in the future ;-)
2017-04-05 17:54:38 +01:00
Emil Velikov e664cfc5a7 intel: genxml: automake: include gen_bits_header.py in the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-05 13:16:28 +01:00
Emil Velikov e180680980 intel: genxml: automake: polish automake rules
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-05 13:16:28 +01:00
Jason Ekstrand 060a6434ec anv: Advertise larger heap sizes
Instead of just advertising the aperture size, we do something more
intelligent.  On systems with a full 48-bit PPGTT, we can address 100%
of the available system RAM from the GPU.  In order to keep clients from
burning 100% of your available RAM for graphics resources, we have a
nice little heuristic (which has received exactly zero tuning) to keep
things under a reasonable level of control.

Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>
2017-04-04 18:33:52 -07:00
Jason Ekstrand 651ec926fc anv: Add support for 48-bit addresses
This commit adds support for using the full 48-bit address space on
Broadwell and newer hardware.  Thanks to certain limitations, not all
objects can be placed above the 32-bit boundary.  In particular, general
and state base address need to live within 32 bits.  (See also
Wa32bitGeneralStateOffset and Wa32bitInstructionBaseOffset.)  In order
to handle this, we add a supports_48bit_address field to anv_bo and only
set EXEC_OBJECT_SUPPORTS_48B_ADDRESS if that bit is set.  We set the bit
for all client-allocated memory objects but leave it false for
driver-allocated objects.  While this is more conservative than needed,
all driver allocations should easily fit in the first 32 bits of address
space and keeps things simple because we don't have to think about
whether or not any given one of our allocation data structures will be
used in a 48-bit-unsafe way.

Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>
2017-04-04 18:33:52 -07:00
Jason Ekstrand 439da38d18 anv: Replace anv_bo::is_winsys_bo with a uint32_t flags
Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>
2017-04-04 18:33:52 -07:00
Jason Ekstrand 5d1ba2cb04 anv/blorp: Align vertex buffers to 64B
This fixes issues seen when adding support for full 48-bit addresses.
The 48-bit addresses themselves have nothing to do with it other than
that it caused the kernel to place buffers slightly differently so they
interacted differently with the caches.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-04 18:33:52 -07:00
Jason Ekstrand c964f0e485 anv: Query the kernel for reset status
When a client causes a GPU hang (or experiences issues due to a hang in
another client) we want to let it know as soon as possible.  In
particular, if it submits work with a fence and calls vkWaitForFences or
vkQueueQaitIdle and it returns VK_SUCCESS, then the client should be
able to trust the results of that rendering.  In order to provide this
guarantee, we have to ask the kernel for context status in a few key
locations.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-04 18:33:52 -07:00