We were not properly writing page tables when the virtual address
range spans multiple subtrees of the tables.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
The implementation of CreateRenderPass2 uses the helpers we broke out in
previous commits. The implementations of the new vkCmd functions just
call the old versions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This makes certain checks a bit easier and means that we don't have
the attachment information duplicated in the attachment list and in
depth_stencil_attachment.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This helps us to compact original instruction:
mul(8) g3<1>D g6<8,8,1>UD 0x00000006UD { align1 1Q };
So now we emit:
mul(8) g3<1>UD g6<8,8,1>UD 0x00000006UD { align1 1Q compacted };
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
At the time of commit 7bc6e455e2 (i965: Add support for saturating
immediates.) we thought mixed type saturates would be impossible. We
were only thinking about type converting moves from D to F, for
example. However, type converting moves w/saturate from F to DF are
definitely possible. This change minimally relaxes the restriction to
allow cases that I have been able trigger via piglit tests.
Fixes new piglit tests:
- arb_gpu_shader_fp64/execution/built-in-functions/fs-sign-sat-neg-abs.shader_test
- arb_gpu_shader_fp64/execution/built-in-functions/vs-sign-sat-neg-abs.shader_test
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This is achived by copying the sign(abs(x)) optimization from the FS
backend.
On Gen7 an earlier platforms, this fixes new piglit tests:
- glsl-1.10/execution/vs-sign-neg-abs.shader_test
- glsl-1.10/execution/vs-sign-sat-neg-abs.shader_test
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
In Python 2, `print` was a statement, but it became a function in
Python 3.
Using print functions everywhere makes the script compatible with Python
versions >= 2.6, including Python 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
The bug fixed by the previous commit went undetected because extra
stderr messages are not flagged by the CI. Copy the solution from
fs_visitor::nir_emit_instr and mark the default case unreachable.
An alternate solution is to delete the default case so that the compiler
will issue a warning. That may require more work since there are other
(impossible) cases that exist.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Some of the lowering passes, nir_lower_locals_to_regs for example, can
cause some previously live code to be dead. This pass in particular
leaves a bunch of nir_instr_type_deref instructions floating around.
This causes shader-db runs on Gen5 through Haswell to spew tons of
messages like:
VS instruction not yet implemented by NIR->vec4
UnrealEngine4/EffectsCaveDemo/239.shader_test is one shader that
generates these messages. Cleaning up the dead code fixes that.
To verify, I did a shader-db before and after. Even though all the
messages are gone, the results make my brain hurt. :(
Haswell
total cycles in shared programs: 411890163 -> 411891145 (<.01%)
cycles in affected programs: 57016 -> 57998 (1.72%)
helped: 3
HURT: 11
helped stats (abs) min: 2 max: 154 x̄: 96.67 x̃: 134
helped stats (rel) min: 0.08% max: 2.23% x̄: 1.42% x̃: 1.96%
HURT stats (abs) min: 18 max: 686 x̄: 115.64 x̃: 20
HURT stats (rel) min: 0.81% max: 7.12% x̄: 1.87% x̃: 0.93%
95% mean confidence interval for cycles value: -51.39 191.67
95% mean confidence interval for cycles %-change: -0.14% 2.46%
Inconclusive result (value mean confidence interval includes 0).
Ivy Bridge
total cycles in shared programs: 259114802 -> 259115032 (<.01%)
cycles in affected programs: 24034 -> 24264 (0.96%)
helped: 1
HURT: 9
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08%
HURT stats (abs) min: 18 max: 48 x̄: 25.78 x̃: 20
HURT stats (rel) min: 0.80% max: 1.94% x̄: 1.08% x̃: 0.80%
95% mean confidence interval for cycles value: 12.42 33.58
95% mean confidence interval for cycles %-change: 0.54% 1.38%
Cycles are HURT.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 5a02ffb733 nir: Rework lower_locals_to_regs to use deref instructions
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
On Python 2, the default JSON separators are ', ' for items and ': ' for
dicts.
On Python 3, the default is the same when no indent is specified, but if
one is (and we do specify one) then the default items separator becomes
',' (the dict separator remains unchanged).
This change explicitly specifies the Python 3 default, which helps
ensuring that the output is identical, whether it was generated by
Python 2 or 3.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
For gen8+, write out PPGTT tables in aub files so that full 48-bit
addresses can be serialized.
v2: Fix handling of `end` index in map_ppgtt
v3: Correctly mark GGTT entry as present (Rafael)
Signed-off-by: Scott D Phillips <scott.d.phillips@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
With PPGTT mappings, our aubinator implementation can be quite slow if
we request a buffer that doesn't exist. Instead of doing a PPGTT walk
for invalid addresses (0 lengths), wait until we're sure we want to
decode the data.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
v2: by Lionel
Fix memfd_create compilation issue
Fix pml4 address stored on 32 instead of 64bits
Return no buffer if first ppgtt page is not mapped
v3: Drop additional memfd_create() (Rafael)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
We use memfd to store physical pages as they get read/written to and
the GGTT entries translating virtual address to physical pages.
Based on a commit by Scott Phillips.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Now that we're softpinning the address of our BOs in anv & i965, the
addresses selected start at the top of the addressing space. This is a
problem for the current implementation of aubinator which uses only a
40bit mmapped address space.
This change keeps track of all the memory writes from the aub file and
fetch them on request by the batch decoder. As a result we can get rid
of the 1<<40 mmapped address space and only rely on the mmap aub file
\o/
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
On a follow up commit in this series, we stop copying the data from
the mmap'ed file into our big gtt mmap, and start referencing data in
it directly. So reallocating the read buffer and adding more data from
stdin wouldn't work. For that reason, let's stop supporting stdin
process.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
In 6f5abf3146 this code was fixed to calculate the maximum size of
an attribute in a seperate pass and then allocate the registers to
that size. However this wasn’t taking into account ranges that overlap
but don’t have the same starting location. For example:
layout(location = 0, component = 0) out float a[4];
layout(location = 2, component = 1) out float b[4];
Previously, if ‘a’ was processed first then it would allocate a
register of size 4 for location 0 and it wouldn’t allocate another
register for location 2 because it would already be covered by the
range of 0. Then if something tries to write to b[2] it would try to
write past the end of the register allocated for ‘a’ and it would hit
an assert.
This patch changes it to scan for any overlapping ranges that start
within each range to calculate the maximum extent and allocate that
instead.
Fixed Piglit’s arb_enhanced_layouts/execution/component-layout/
vs-fs-array-interleave-range.shader_test
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: 6f5abf3146 "i965: Fix output register sizes when multiple variables
share a slot."
The Vulkan API provides a mechanism for applications to cache their own
shaders and manage on-disk pipeline caching themselves. Generally, this
is what I would recommend to application developers and I've resisted
implementing driver-side transparent caching in the Vulkan driver for a
long time. However, not all applications do this and, for some
use-cases, it's just not practical.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Before, we were only hashing the shader if we had a shader cache to
cache things in. This means that if we ever get it wrong, we could end
up trying to cache a shader with an undefined hash. Since not having a
shader cache is an extremely uncommon case, let's optimize for code
clarity and obvious correctness over avoiding a hash operation.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
If a client is dumb enough to not specify a pipeline cache, give it a
default. We have to create one anyway for blorp so we may as well let
the client cache shaders in it.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Previously, we just hashed the entire descriptor set layout verbatim.
This meant that a bunch of extra stuff such as pointers and reference
counts made its way into the cache. It also meant that we weren't
properly hashing in the Y'CbCr conversion information information from
bound immutable samplers.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
According to RenderDoc, this shaves 99.6% of the run time off of the
ambient occlusion pass in Skyrim Special Edition when running under DXVK
and shaves 92% off the runtime for a reasonably representative frame.
When running the actual game, Skyrim goes from being a slide-show to a
very stable and playable framerate on my SKL GT4e machine.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Running VK-CTS in batch execution mode was raising the
VK_ERROR_INITIALIZATION_FAILED error in multiple tests. But when the
same failing tests were run isolated they always passed.
createDevice and destroyDevice were called before and after every
tests. Because the binding_table_pool was never closed, we reached the
maximum number of open file descriptors (ulimit -n) and when that
happened every call to createDevice implied a
VK_ERROR_INITIALIZATION_FAILED error.
Fixes: c7db0ed4e9
("anv: Use a separate pool for binding tables when soft pinning")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>