Commit Graph

4854 Commits

Author SHA1 Message Date
Erik Faye-Lund dd77bdb34b anv: remove incorrect polygonMode=point early-out
This is incorrect, because polygonMode only applies if the final
primitive type is a polygon; polygonMode doesn't apply to
line-primitives as the comment suggests.

The Vulkan 1.1 spec, section 26.11, "Polygons" defines that polygons are
separate from points and line segments:

" A polygon results from the decomposition of a triangle strip, triangle
  fan or a series of independent triangles. Like points and line segments,
  polygon rasterization is controlled by several variables in the
  VkPipelineRasterizationStateCreateInfo structure. "

Further, section 26.11.2, "Polygon Mode", only define polygonMode to
apply to polygons:

" Possible values of the VkPipelineRasterizationStateCreateInfo::polygonMode
  property of the currently active pipeline, specifying the method of
  rasterization for polygons, are: "

This seems to clearly define that polygonMode doesn't apply to points
and lines, so let's make sure that we don't early out with the wrong
value.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-11-01 07:26:03 +00:00
Jason Ekstrand f60ef0fff4 anv: Move the RT BTI flush workaround to begin_subpass
Now that we're no longer compacting binding table entries, the only time
they can possibly change is when we actually switch subpasses.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-10-31 21:07:15 +00:00
Jason Ekstrand 6a8f43030c anv: Stop compacting render targets in the binding table
Instead, always emit one entry for every color attachment in the subpass
or one NULL if there are no color attachments.  This will let us adjust
an Ice Lake workaround so we don't get a stall on every draw call.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-10-31 21:07:15 +00:00
Jason Ekstrand c765e2156a anv: Don't claim the null RT as a valid color target
If it's NULL, we can let the compiler go ahead and delete it or flag it
as NULL.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-10-31 21:07:15 +00:00
Jason Ekstrand df7a730b4f anv: Don't delete fragment shaders that write sample mask
Also, use color_outputs_valid rather than nr_color_outputs since it
should be a bit more accurate.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-10-31 21:07:15 +00:00
Jason Ekstrand 02d9403067 anv: Use the new BO alloc API for Android
Fixes: a44f5ee0d8 "anv: Rework the internal BO allocation API"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 15:46:39 +00:00
Eric Engestrom 791ece114e anv: add missing xmlconfig headers dependency
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2019-10-31 15:29:06 +00:00
Jason Ekstrand 0ca0ad1252 anv: Zero released anv_bo structs
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:09 +00:00
Jason Ekstrand b3c0b1b218 anv: Use a bitset for tracking residency
Now that we can conveniently map between GEM handles and struct anv_bo
pointers, we can use a simple bitset for residency tracking instead of
the complex hash set.  This shaves about 3% off of a CPU-limited example
running with the Dawn WebGPU implementation.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:09 +00:00
Jason Ekstrand 9ef198c59a anv: Set the batch allocator for compute pipelines
Otherwise relocations just up and crash.

Fixes: a3153162a9 "anv: Delay allocation of relocation lists"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:09 +00:00
Jason Ekstrand 9f665d9c1c anv: Add a device parameter to anv_execbuf_add_bo
We're about to start needing to lookup BO pointers by GEM handle so we
need access to the device.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:09 +00:00
Jason Ekstrand 63d7a38630 anv: Drop anv_bo_init and anv_bo_init_new
BOs are now only ever allocated through the BO cache so there's no need
to have these exposed.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:09 +00:00
Jason Ekstrand 853d3b59fd anv: Allocate misc BOs from the cache
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:09 +00:00
Jason Ekstrand d0ec55d5a3 anv: Allocate scratch BOs from the cache
While we're here, we get rid of the locking and use a lock-free
algorithm.  The chances of spilling contention are low and this is
actually a bit simpler in some ways.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:09 +00:00
Jason Ekstrand ee77938733 anv: Allocate batch and fence buffers from the cache
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:09 +00:00
Jason Ekstrand 0a6d2593b8 anv: Allocate descriptor buffers from the BO cache
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:09 +00:00
Jason Ekstrand e0ee23660f anv: Set more flags on descriptor pool buffers
the ASYNC flag, in particular, has the potential to help performance
because it means less sync tracking in the kernel.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:09 +00:00
Jason Ekstrand c3eb4b3ba5 anv: Allocate query pool BOs from the cache
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:09 +00:00
Jason Ekstrand 0d2787f7c9 anv: Use the query_slot helper in vkResetQueryPoolEXT
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:09 +00:00
Jason Ekstrand 3119b96bdf anv: Allocate block pool BOs from the cache
This commit switches block pools over to being allocated from the BO
cache rather than being allocated manually by the block pool.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:09 +00:00
Jason Ekstrand cc972d72c7 anv/tests: Initialize the BO cache and device mutex
We're about to start depending on the BO cache in the state and block
pools so we need them properly initialized for the tests to work.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:09 +00:00
Jason Ekstrand 9076e9f375 anv/tests: Zero-initialize instances
Some of the tests were actually relying on some of those uninitialized
bits to be non-zero.  In particular, a couple want use_softpin = true.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:09 +00:00
Jason Ekstrand 5c664dff75 anv: Choose BO flags internally in anv_block_pool
All block pools are allocated with the same flags.  There's no good
reason why it needs to be configurable.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:09 +00:00
Jason Ekstrand a44f5ee0d8 anv: Rework the internal BO allocation API
This makes a number of changes to the current API:

 1. Everything is renamed to anv_device_* instead of anv_bo_cache_*
    because the BO cache is soon going to be the sole BO allocation path
    and not some special case to make import/export work.

 2. Drop the cache parameter.  It's totally redundant with the device
    and just annoying to keep typing.

 3. Rework flags so that they go the convenient direction for usage in
    ANV rather than whichever awkward way the i915 specified it to
    maintain backwards compatibility.  This also gives us the
    opportunity to set some defaults.

 4. Add flags for mapping and coherency.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:09 +00:00
Jason Ekstrand 1be2e4c0ef anv: Use anv_block_pool_foreach_bo in get_bo_from_pool
While we're at it, use gen_48b_address().

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:08 +00:00
Jason Ekstrand 3178e583c8 anv: Rework anv_block_pool_expand_range
The growing algorithms for the softpin case and the userptr version are
almost entirely different.  Having this weird join doesn't make the code
more comprehensible.  This rework does a few things:

 1. Move the comment about 48-bit addresses to anv_device_init where we
    actually unset the EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag.

 2. Separate the paths in anv_block_pool_expand_range so it's easier to
    see what happens in the two different cases.

 3. Use the anv_block_poo::bos array for storing all allocated BOs in
    both paths rather than using the cleanup list in both paths.  This
    lets us make the cleanups array only used for mmaps of the memfd for
    the userptr case.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:08 +00:00
Jason Ekstrand bb257e1852 anv: Fix a potential BO handle leak
Fixes: 731c4adcf9 "anv/allocator: Add support for non-userptr"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:08 +00:00
Jason Ekstrand 6f4fa81769 anv: Handle state pool relocations using "wrapper" BOs
Instead of depending on a mutable BO in the state pool for handling
growing state pools, add a concept of "wrapper" BOs which just wrap an
actual BO.  This way, the wrapper can exist once for all of time and we
can put it in relocation lists even if the actual BO it references gets
swapped out.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:08 +00:00
Jason Ekstrand b781c85c79 anv: Replace ANV_BO_EXTERNAL with anv_bo::is_external
We're not THAT strapped for space that we can't burn one extra bit for
a boolean.  If we're really worried about it, we can always shrink the
flags field to 16 bits because the kernel only uses 7 currently.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:08 +00:00
Jason Ekstrand 5534358ef6 anv: Inline anv_block_pool_get_bo
It has exactly one caller and we're about to change some of the dynamics
which would make this confusing as a separate function.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:08 +00:00
Jason Ekstrand c0a4722f29 anv: Declare the bo in the anv_block_pool_foreach_bo loop
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:08 +00:00
Jason Ekstrand 325345b2bd anv: Stop storing the GEM handle in anv_reloc_list_add
We have to go through and rewrite them all anyway so it doesn't do us
any good to put them in the list in anv_reloc_list_add.  Also, for state
pools the handles are likely wrong by the time vkQueueSubmit is called.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:08 +00:00
Jason Ekstrand c4be72934e anv: Fix a relocation race condition
Previously, we would read the offset from the BO in anv_reloc_list_add
to generate the presumed offset and then again in the caller to compute
the 64-bit address to write into the buffer.  However, if the offset
somehow changed between these two points, the presumed offset would no
longer match the written offset.  This is unlikely to actually ever be a
problem in practice because the presumed offset gets recorded first and
so if the written address is wrong then the presumed offset is almost
certainly wrong and the relocation will trigger.  However, it's much
safer to simply have anv_reloc_list_add return the 64-bit address.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:08 +00:00
Jason Ekstrand bbf389013f anv: Use a util_sparse_array for the GEM handle -> BO map
This lets us do less allocation because the anv_bo's are now embedded in
the sparse array and it also allows lock-free translation from GEM
handle to BO which will be useful in future commits.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:08 +00:00
Jason Ekstrand 821ce7be36 anv: Move refcount to anv_bo
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-31 13:46:08 +00:00
Lionel Landwerlin b087b7bd90 intel/perf: fix Android build
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 15b7b56eb2 ("intel/perf: add TGL support")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
2019-10-31 11:20:30 +00:00
Bas Nieuwenhuizen 3e86d553a4 anv: Remove _mesa_locale_init/fini calls.
The resulting locale is not used for Vulkan, and it is not reference
counted, giving issues when multiple instances are created.

CC: 19.2 19.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-31 09:47:56 +00:00
Lionel Landwerlin 15b7b56eb2 intel/perf: add TGL support
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-31 09:13:20 +00:00
Ian Romanick 7b3f38ef69 intel/compiler: Report the number of non-spill/fill SEND messages on vec4 too
This make shader-db's report.py work on Haswell and earlier platforms.
The problem is that the script would detect the "sends" output for
scalar shaders and expect in in vec4 shaders too.  When it didn't find
it, the script would fail with:

    Traceback (most recent call last):
      File "./report.py", line 351, in <module>
        main()
      File "./report.py", line 182, in main
        before_count = before[p][m]
    KeyError: 'sends'

Fixes: f192741ddd ("intel/compiler: Report the number of non-spill/fill SEND messages")

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-30 21:27:03 -07:00
Lionel Landwerlin e02c181bfd intel/dev: set default num_eu_per_subslice on gen12
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 8125d7960b ("intel/dev: Add preliminary device info for Tigerlake")
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-10-30 22:30:09 +00:00
Jordan Justen 2b186264cc intel/eu/validate/gen12: Add TGL to eu_validate tests.
These reworks were combined into this patch:

 * Matt Turner: i965: Disable NoDDChk/NoDDClr test on Gen12+
 * Francisco Jerez: intel/eu/validate/gen12: Disable
   qword_low_power_no_depctrl eu_validate test.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-30 14:08:51 -07:00
Jordan Justen 8125d7960b intel/dev: Add preliminary device info for Tigerlake
Reworks:
 * adjust 64-bit support, hiz (Jason Ekstrand)
 * sim-id (Lionel Landwerlin)
 * adjust threads, urb size (Rafael Antognolli)
 * adjust urb size (Kenneth Graunke)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-30 14:08:48 -07:00
Lionel Landwerlin 632995227c intel/dump_gpu: handle context create extended ioctl
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-10-30 21:58:31 +02:00
Rafael Antognolli 3c317e8187 anv: Add Tile Cache Flush for Unified Cache. 2019-10-30 19:51:03 +00:00
Rafael Antognolli a99c67b690 blorp: Add Tile Cache Flush for Unified Cache. 2019-10-30 19:51:03 +00:00
Jordan Justen f573cd4757 intel/genxml: Add gen12 tile cache flush bit
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2019-10-30 19:51:03 +00:00
Rafael Antognolli e51722a7c7 anv: Align fast clear color state buffer to a page.
On gen11 and older, compressed images are tiled and aligned to 4K. On
gen12 this 4K alignment restriction was removed. However, only aligning
the fast clear color buffer to 64B (a cacheline, as it's on the
documentation) is causing some bugs where the fast clear color is not
converted during the fast clear operation. Aligning things to 4K seems
to fix it.

v2: Assert that image->planes[plane].offset is 4K aligned (Nanley)

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-10-30 19:41:29 +00:00
Matt Turner 12d3b11908 intel/compiler: Add instruction compaction support on Gen12
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-30 11:11:50 -07:00
Matt Turner c8fbc8823f intel/compiler: Make separate src0/src1 index tables
TGL uses different data (and even a different format!) for each source.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-30 11:11:50 -07:00
Matt Turner cde73625f8 intel/compiler: Inline get_src_index()
TGL will have separate tables for src0 and src1, so the shared function
will no longer make sense.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-30 11:11:50 -07:00