It should be incremented by one according to
how it is calculated by 'emit_vertex_buffer_state':
"\#if GEN_GEN < 8
.BufferAccessType = step_rate ? INSTANCEDATA : VERTEXDATA,
.InstanceDataStepRate = step_rate,
\#if GEN_GEN >= 5
.EndAddress = ro_bo(bo, end_offset - 1),
\#endif
\#endif"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109449
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Use L3 configuration specified in h/w specification.
V2: Drop configs which do under allocation of l3 cache.
Bump up the comment above table.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The engine to which the batch was sent to is now set to the decoder context when
decoding the batch. This is needed so that we can distinguish between
instructions as the render and video pipe share some of the instruction opcodes.
v2: The engine is now in the decoder context and the batch decoder uses a local
function for finding the instruction for an engine.
v3: Spec uses engine_mask now instead of engine, replaced engine class enums
with the definitions from UAPI.
v4: Fix up aubinator_viewer (Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Preliminary work for adding handling of different pipes to gen_decoder. Each
instruction needs to have a definition describing which engine it is meant for.
If left undefined, by default, the instruction is defined for all engines.
v2: Changed to use the engine class definitions from UAPI
v3: Changed I915_ENGINE_CLASS_TO_MASK to use BITSET_BIT, change engine to
engine_mask, added check for incorrect engine and added the possibility to
define an instruction to multiple engines using the "|" as a delimiter in the
engine attribute.
v4: Fixed the memory leak.
v5: Removed an unnecessary ralloc_free().
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
There are some cases where the VS is the only stage enabled, it uses the
entire URB, and the URB is large enough that placing later stages after
the VS exceeds the number of bits for "URB Starting Address".
For example, on Icelake GT2, "varying-packing-simple mat2x4 array" from
Piglit is getting a starting offset of 128 for the GS/HS/DS. But the
field is only large enough to hold an offset of 127.
i965 doesn't hit any genxml assertions because it's still using the old
OUT_BATCH mechanism. 128 << GEN7_URB_STARTING_ADDRESS_SHIFT (57) == 0,
with the extra bit falling off the end. So we place the disabled stage
at the beginning of the URB (overlapping with push constants). This is
likely okay since it's a zero size region (0 entries).
It seems like the Vulkan driver might hit this assertion, however, and
the situation seems harmless. To work around this, always place
disabled stages at the start of the URB, so the last enabled stage can
fill the remaining space without overflowing the field.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Use the 'DWord Length' and 'bias' fields from the instruction definition to
parse the packet length from the command stream when possible. The hardcoded
mechanism is used whenever an instruction doesn't have this field.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This function was there when the file was introduced in commit
38f10d5a03 "intel: tools: add aubinator viewer", but was
never actually used.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Pointer arithmetic...
v2: s/4/sizeof(uint32_t)/ (Eric)
v3: Give bytes to print_batch() in error_decode (Lionel)
Make clear what values we're dealing with in error_decode (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v2)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
STATE_BASE_ADDRESS only modifies various bases if the "modify" bit is
set. Otherwise, we want to keep the existing base address.
Iris uses this for updating Surface State Base Address while leaving the
others as-is.
v2: Also update aubinator_viewer_decoder (caught by Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
construct correct gen xml filename when we try to load hardware xml
description from a given path
v2: remove temporary variable (Francesco Ansanelli)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The "gen_group_get_length" function can return a negative value
and it can lead to the out of bounds group_iter.
v2: printing of "unknown command type" was added
v3: just the asserts are added
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instead of printing addresses like everyone else, we were accidentally
printing the offset from state base address. Also, state_map is a void
pointer so we were incrementing in bytes instead of dwords and every
state other than the first was wrong.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
First of all, setting iter->name in advance_field is unnecessary because
it gets set by gen_decode_field which gets called immediately after
gen_decode_field in the one call-site. Second, we weren't properly
initializing start_bit and end_bit in the initial condition of
gen_field_iterator_next so the first field of a struct would get printed
wrong if it doesn't start on the first bit. This is fixed by adding a
iter_start_field helper which sets the field and also sets up the other
bits we need. This fixes decoding of 3DSTATE_SBE_SWIZ.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Only used, when asserts are enabled.
Fixes an unused-variable warning with GCC 8:
../../../src/intel/common/gen_decoder.c: In function 'gen_spec_load':
../../../src/intel/common/gen_decoder.c:535:47: warning: variable 'total_length' set but not used [-Wunused-but-set-variable]
uint32_t text_offset = 0, text_length = 0, total_length;
^~~~~~~~~~~~
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Shader time hard codes an index of the shader time buffer within the
gen program.
In order to support shader time in the disk shader cache, we'd need to
add the shader time index into the program key. This should work, but
probably is not worth it for this particular debug feature.
Therefore, let's just disable the disk shader cache if the shader time
debug feature is used.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106382
Fixes: 96fe36f7ac "i965: Enable disk shader cache by default"
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The various base addresses are simply addresses. There may or may not
be a buffer located at those addresses. So, it doesn't make much sense
to request one. Just save the raw address so we can add it later, when
asking about BOs at the final <base + offset> address.
Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Normally, i965 programs STATE_BASE_ADDRESS every batch, and puts all
state for a given base in a single buffer.
I'm working on a prototype which emits STATE_BASE_ADDRESS only once at
startup, where each base address is a fixed 4GB region of the PPGTT.
State may live in many buffers in that 4GB region, even if there isn't
a buffer located at the actual base address itself.
To handle this, we need to save the STATE_BASE_ADDRESS values across
multiple batches, rather than assuming we'll see the command each time.
Then, each time we see a pointer, we need to ask the driver for the BO
map for that data. (We can't just use the map for the base address, as
state may be in multiple buffers, and there may not even be a buffer
at the base address to map.)
v2: Fix things caught in review by Lionel:
- Drop bogus bind_bo.size check.
- Drop "get the BOs again" code - we just get the BOs as needed
- Add a message about interface descriptor data being unavailable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Since various options within INTEL_DEBUG could impact code generation,
we need to set the disk cache driver_flags parameter based on the
INTEL_DEBUG flags in use.
An example that will affect the program generated by i965 is the
INTEL_DEBUG=nocompact option.
The DEBUG_DISK_CACHE_MASK value is added to mask the settings of
INTEL_DEBUG that can affect program generation.
v2:
* Use driver_flags (Tim)
* Also update Anvil (Jason)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Code assumes that all the necessary fields will exist, but compiler
doesn't know about this. Provide zero as default values, like in other
decoding functions.
Fixes warnings
../../src/intel/common/gen_batch_decoder.c: In function ‘handle_media_interface_descriptor_load’:
../../src/intel/common/gen_batch_decoder.c:347:7: warning: ‘binding_entry_count’ may be used uninitialized in this function [-Wmaybe-uninitialized]
dump_binding_table(ctx, binding_table_offset, binding_entry_count);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../src/intel/common/gen_batch_decoder.c:347:7: warning: ‘binding_table_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]
../../src/intel/common/gen_batch_decoder.c:346:7: warning: ‘sampler_count’ may be used uninitialized in this function [-Wmaybe-uninitialized]
dump_samplers(ctx, sampler_offset, sampler_count);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../src/intel/common/gen_batch_decoder.c:346:7: warning: ‘sampler_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]
../../src/intel/common/gen_batch_decoder.c:343:7: warning: ‘ksp’ may be used uninitialized in this function [-Wmaybe-uninitialized]
ctx_disassemble_program(ctx, ksp, "compute shader");
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../src/intel/common/gen_batch_decoder.c: In function ‘decode_dynamic_state_pointers’:
../../src/intel/common/gen_batch_decoder.c:663:54: warning: ‘state_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]
const uint32_t *state_map = ctx->dynamic_base.map + state_offset;
~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
../../src/intel/common/gen_batch_decoder.c: In function ‘gen_print_batch’:
../../src/intel/common/gen_batch_decoder.c:856:13: warning: ‘next_batch.map’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if (next_batch.map == NULL) {
^
../../src/intel/common/gen_batch_decoder.c:860:13: warning: ‘next_batch.addr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
gen_print_batch(ctx, next_batch.map, next_batch.size,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next_batch.addr);
~~~~~~~~~~~~~~~~
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
strncpy() doesn't guarantee the terminator NUL, so we would need to
set ourselves. Just use snprintf() instead.
Fixes the warnings
../../src/intel/common/gen_decoder.c: In function ‘iter_decode_field’:
../../src/intel/common/gen_decoder.c:897:7: warning: ‘strncpy’ specified bound 128 equals destination size [-Wstringop-truncation]
strncpy(iter->name, iter->field->name, sizeof(iter->name));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘iter_advance_field’,
inlined from ‘gen_field_iterator_next’ at ../../src/intel/common/gen_decoder.c:1015:9:
../../src/intel/common/gen_decoder.c:844:7: warning: ‘strncpy’ specified bound 128 equals destination size [-Wstringop-truncation]
strncpy(iter->name, iter->field->name, sizeof(iter->name));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Our attempt to restart the loop with the second level batch worked at
one point but got broken at some point. It was too fragile anyway and
we're not likely to have enough secondaries to actually overflow the
stack so we may as well recurse in both cases.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
With PPGTT mappings, our aubinator implementation can be quite slow if
we request a buffer that doesn't exist. Instead of doing a PPGTT walk
for invalid addresses (0 lengths), wait until we're sure we want to
decode the data.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
A later patch will make use of this in other places. Also, remove
dependency on undefined behavior of left-shifting a signed value.
v2: - move function into a separate header (Chris)
v3: (by Ken) Add new header to the various build systems.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There is a compile warning from Android 8 (API version 26) from "include cutils/log.h"
warning: "Deprecated: don't include cutils/log.h, use either android/log.h or log/log.h"-W#warnings,
Change to include "log/log.h" on Android 8 or later major version to avoid this warning
Signed-off-by: jenny.q.cao <jenny.q.cao@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
First, this was iterating over the 3DSTATE_CONSTANT_* instruction
but trying to process fields of the 3DSTATE_CONSTANT_BODY substructure.
Secondly, the fields have been called Buffer[0] and Read Length[0],
for a while now, and we were not handling the subscripts correctly.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Given an arbitrary batch, we don't always know what the size of certain
things are, such as how many entries are in a binding table. But it's
easy for the driver to track that information, so with a simple callback
we can calculate this correctly for INTEL_DEBUG=bat.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Making these part of libintel_common allows us to use them in the DRI
driver. The standalone tool binaries already link against the common
library, too, so it's no harder for them.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Struct fields might span several dwords, but iter_dword is incremented
up to the last dword of the current field before we print out the
struct's fields. We can't use iter_dword for computing the offset into
the pointer of data to decode.
v2: Fix displayed offset number (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
<register> & <struct> elements always have fixed length. The
get_length() method implies that we're dealing with an instruction in
which the length is encoded into the variable data but the field
iterator uses it without checking what kind of gen_group it is dealing
with.
Let's make get_length() report the correct length regardless of the
gen_group (register, struct or instruction).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>