The `device` can be set earlier either by a command line or a by
intercepting an ioctl call to get the I915_PARAM_CHIPSET_ID done by
the application early. In both cases `aub_file` and `devinfo` would
not be initialized.
Fix by splitting the conditions
- `device == 0`: use the FD to get both device and devinfo.
- Or `devinfo.gen == 0`: use `device` to initialize it.
And separatedly, initialize aub_file the first time it is needed.
Fixes: d594d2a052 ("intel/tools: use device info initializer")
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If the pixel pipes have a different number of subslices, emit a slice
hashing table that will ensure proper workload distribution.
v2: Don't need to set the mask - it's mbo (Ken).
Add these fields and the 3DSTATE_SLICE_TABLE_STATE_POINTERS instruction
so we can properly configure the slice and subslice hashing on ICL+
v2: Make 'Mask' field a mbo (Ken).
We don't need it for state setup but it's a useful statistic we want to
pass on to developers.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This commit is all annoying plumbing work which just adds support for a
new brw_compile_stats struct. This struct provides a binary driver
readable form of the same statistics we dump out to stderr when we
INTEL_DEBUG is set with a shader stage.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
While NIR's lower_imul64() solves the case of 64 bit integer multiplications
generated early, we don't have a way to lower such instructions when they are
generated by our own backend, such as the scan/reduce intrinsics. We'll need
this soon, so implement it now.
An easy way to test this is to simply disable nir_lower_imul64 to let
those operations reach the backend.
v2:
- Fix Q/UQ copy/paste errors (Caio).
- Transform an 'if' into 'else if' (Caio).
- Add an extra comment to clarify the need for 64b = 32b * 32b
(Caio).
- Make private functions private (Caio).
v3:
- Remove ambiguity with 'b' and 'd' variables (Caio).
- Allocate potentially less regs for the dwords (Caio).
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Matt Turner <matt.turner@intel.com>
Cc: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Invert the logic of how progress is handled: remove the continue
statements and mark progress inside the places where it actually
happens.
We're going to add a new lowering that also looks for BRW_OPCODE_MUL,
so inverting the logic here makes the resulting code much easier to
follow.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Don't instantiate a builder for each instruction during
lower_integer_multiplication(). Instantiate one only when needed.
On the other hand, these unneeded builders don't seem to cost much to
init, so I don't expect any significant difference in performance:
this is mostly about code organization.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
The lower_integer_multiplication() function is already a little too
big. I want to add more to it, so let's reorganize the existing code
first. Let's start with just extracting the current code to
subfunctions. Later we'll change them a little more.
v2: Make private functions private (Caio).
v3: Fix typo (Caio).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
v2: add to series
v3: update Makefile.sources
v4: don't remove a comment and break statement
v4: use nir_can_move_instr
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
See "i965/gen9: Optimize slice and subslice load balancing behavior."
for the rationale. According to Jason, improves Aztec Ruins
performance by 2.7%.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
v2: Undo CPU performance micro-optimization done in i965 and iris due
to lack of data justifying it on anv. Use
cmd_buffer_apply_pipe_flushes wrapper instead of emitting pipe
control command directly. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The version bump adds a proper features struct.
Fixes: d10de25309 "anv: Implement VK_EXT_subgroup_size_control"
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Android 9 loader conditionally advertises VK_KHR_shared_presentable_image
extension based on this property and it looks like it does not
initialize the struct before query.
Pragmas are added to ignore warnings with Android specific structure
types in same manner as commit 8d386e6eef did.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
vkpipeline-db for my Skylake GPU:
total instructions in shared programs: 8847602 -> 8847896 (<.01%)
instructions in affected programs: 10165 -> 10459 (2.89%)
helped: 8
HURT: 2
total cycles in shared programs: 1606273555 -> 1606251634 (<.01%)
cycles in affected programs: 2201803 -> 2179882 (-1.00%)
helped: 7
HURT: 3
The shaders with more instructions is due to a loop over a shared array
in Three Kingdoms being unrolled (and creating a lot of nested ifs). Not sure
if that's good or bad.
One of the shaders with worse cycles is only worse by 0.04% and the other
two are the shaders with loops unrolled.
v2: add patch
v4: don't set spirv_options.shared_addr_format
v4: move comment concerning the shared address format used and NULL
v4: add vkpipeline-db results
v5: rename to nir_lower_vars_to_explicit_types
v5: move setting of total_shared to outside brw_compile_cs
v6: set shared_addr_format
v6: formatting changes
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> (v5)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
New function supports gralloc1 usage flags that get set separately
for producer and consumer. As we still need to support old method too,
let's share common code and use android_convertGralloc0To1Usage helper.
Bump the VK_ANDROID_native_buffer version to indicate support for the
new call.
Changes were tested on Android Celadon P with Basemark GPU and various
Sascha Willems Vulkan demos.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
INTEL_DEBUG=perfmon will iterate over the perf queries, printing
information about the state of each query. Some of this information
will be private to intel/perf, and needs to a dump routine that can be
called from i965.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that all references from i965 have been moved to perf, we can make
internal methods private again.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
By encapsulating this implementation within perf, we can eventually
make struct gen_perf_ctx private.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This refactor moves several helper functions for get_query_data as
well:
- accumulate_oa_reports
- read_gt_frequency
- get_pipeline_stats_data
- get_oa_counter_data
Functions which are no longer referenced in brw_performance_query.c
have been removed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The following methods have duplicate implementation of read_oa_samples_until in
brw_performance_query.c:
- read_oa_samples_for_query
- read_oa_samples_until
They ar still referenced by other methods in the file and will be
removed on the subsequent commit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To move more operations into intel/perf, several state items are
needed. Save references to that state in the perf_ctxt, rather than
passing them in for every operation.
This commit includes an initializer for gen_perf_context, to set those
references and also encapsulate the initialization of the sample
buffer state.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>