This is actually a non-threaded implementation. I'd summarize this
as event-based submission.
When submit happens we walk a tree of submissions that depend on
the syncobj signal operations to be submitted and if those submission
we no other dependencies we start to execute them immediately.
Or, well I still use a list to avoid issues with long chains and
the stacksize when using recursion.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This does not fully do wait-before-submit, to be done in a follow
up patch.
For kernels without support for timeline syncobjs, this adds an
implementation of non-shareable timelines using legacy syncobjs.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This will lead to fewer pipelines in the cache, which is assumed to
become our most unavoidable performance bottle-neck down the line.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This is a function with timeout support for reading from the pipe
between processes used for secure compile.
Initially we hardcode the timeout to 5 seconds. We can adjust the
timeout limit in future if needed.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This will be used in the following patch to support timeouts for
reading the pipe between processes.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This skips touching %ebx most times and it shows that glGetString performance
increased from 114M/s to 120M/s on my desktop.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
Remove hard coded 16 and use entry_generate_or_patch to patch
public stubs. The generated code actually is sightly tighter
than before since the "nop" instructions before the final "jmp"
get removed.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
The code works exactly the same with before. Just split this function
out so we can reuse it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
The x86 assembly language stub in src/mapi/entry_x86_tsd.h does not
generate PIC (position-independent code). This causes text relocations
which bring troubles on recent versions of FreeBSD, OpenBSD, Android.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108541
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
We have to resolve destination surfaces if we are bliting to and from
the same surface.
v2: Revert unrelated change (Nanley Chery)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Let aux surface state tracker track the stencil buffer's aux state while
clearing depth stencil buffer.
v2: Fix condition check (Nanley Chery)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Even though stencil buffer compression looks like regular lossless color
compression w/o fast clear support, we have to resolve stencil buffer
with WM_HZ_OP packet.
v2: Check if resource is stencil with helper function (Nanley Chery)
v3: Remove unnecessary included file (Nanley Chery)
v4: (Nanley Chery)
- Avoid stencil buffer aux state transition by improving condition check
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
On Gen12+, Stencil buffer's lossless compression should be resolved
with WM_HZ_OP packet.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
We never saw any failures regarding this typo but it's good to assign
correct stencil view while constructing blorp_params.
Fixes: 0cabf93b80 "intel/blorp: Add an entrypoint for clearing depth and stencil"
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
On Gen12, the CCS buffer address doesn't have to be referenced in state
packets. In the case of a stencil buffer with CCS, the kernel won't know
the location of the CCS unless an extra call is made to pin its address.
To avoid this extra call, make the CCS part of the main surface.
v2. Update comment above bo_size. (Jordan)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The functions used during aux buffer configuration and creation only
return false for exceptional errors. Don't proceed with surface creation
in those cases.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The original value of 256 was under the assumption that you're a batch
buffer which is likely going to have a large number of relocations.
However, pipeline objects on Gen7 will have at most 6 relocations (one
per shader stage and one for the workaround BO) so this is a lot of
per-pipeline wasted space.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The old relocation list code always allocated 256 relocations and a hash
set up-front without knowing whether or not we really need them. In
particular, in the softpin case, this is two fairly large allocations
that we don't need to be making. Also, for pipeline objects on haswell
where we don't have softpin, we don't need relocations unless scratch is
used so this is extra data per-pipeline. Instead, we should do it
on-demand. This shaves 3.5% off of a cpu-limited example running with
the Dawn WebGPU implementation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
For gen12 we set the streamout buffers using 4 separate
commands instead of 3DSTATE_SO_BUFFER.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>