Use fstat() only to pre-allocate a big enough buffer.
This fixes a race where if the file grows between fstat() and read()
we would be missing the end of the file, and if the file slims down
read() would just fail.
Fixes: 316964709e "util: add os_read_file() helper"
Reported-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We want to be able to call ra_allocate() and, when it fails, mutate the
graph and try again rather than re-building the graph from scratch.
This commit moves all the scratch bits except the final register
allocation (which is really an out value not scratch) into sub-structs
named "tmp" to make it clear which things are scratch. It also adds
bits to the ra_select() initialization loop to initialize things (since
we can't trust rzalloc anymore) and copy q_test and forced_reg over.
Reviewed-by: Eric Anholt <eric@anholt.net>
Unfortunately, we can't quite follow the standard C conventions for
these because ralloc doesn't know the sizes of pointers.
Reviewed-by: Eric Anholt <eric@anholt.net>
The most expensive part of register allocation is the ra_simplify step
which is a fixed-point algorithm with a worst-case complexity of O(n^2)
which adds the registers to a stack which we then use later to do the
actual allocation. This commit uses bit sets and changes the core loop
of ra_simplify to first walk 32-node chunks and then walk each chunk.
This lets us skip whole 32-node chunks in one go based on bit operations
and compute the minimum q value potentially 32x as fast. Of course, the
algorithm still has the same fundamental O(n^2) run-time but the
constant is now much lower.
In the nasty Aztec Ruins compute shader, this shaves a full four seconds
off the 30s compile time for a release build of mesa. In a debug build
(needed for accurate stack traces), perf says that ra_select takes 20%
of runtime before this patch and only 5-6% of runtime after this patch.
It also makes shader-db runs faster.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15311100 -> 15311100 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 355468050 -> 355468050 (0.00%)
cycles in affected programs: 0 -> 0
helped: 0
HURT: 0
Total CPU time (seconds): 2602.37 -> 2524.31 (-3.00%)
Reviewed-by: Eric Anholt <eric@anholt.net>
We only use q_total if the reg is not assigned so there's no point in
updating it if the reg is not assigned. This has no known perf benefit
but it will reduce churn in a future commit.
Reviewed-by: Eric Anholt <eric@anholt.net>
This shaves about half a second off the 30 second compile time of one of
the compute shaders in Aztec ruins.
Reviewed-by: Eric Anholt <eric@anholt.net>
The V3D 4.2 HW has a limit to MSAA texture sizes of 4096. With non-MSAA,
we can go up to 7680 (actually probably 8138, but that hasn't been
validated by the HW team). Exposing 7680 in X11 will allow dual 4k displays.
Often times you don't know how big a set will be and you want the code
to just grow it as needed. However, sometimes you do know and you can
avoid a lot of rehashing if you just specify a size up-front.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
This function is identical to _mesa_set_add except that it takes an
extra out parameter that lets the caller detect if a replacement
happened.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
The MASK macro is used in the RANGE macro, and it should
return the pre-bitset word mask for the (b) value.
i.e.
BITSET_MASK(0) should be undefined since it's meaningless.
BITSET_MASK(31) should give 0x7fffffff
BITSET_MASK(32) should give 0xffffffff
BITSET_MASK(33) should give 0x00000001
BITSET_MASK(64) should give 0xffffffff
However then BITSET_RANGE ends up broken for cases where
it's (b) value is the 0,32,64 value as in that case the lower
mask would be 0 not 0xffffffff.
This fixes the unit tests that I've added, and my code that
uses bitsets.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: bb38cadb1c "More GLSL code"
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
The last test here currently fails as there is a bug in bitset.h
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I want to be able to do BITSET_TEST() != BITSET_TEST() and this isn't
currently possible because BITSET_TEST() returns a random bit. Compare
to zero to get an actual Boolean.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This #include is needed for `NULL`, which is used on all OSes, not just Linux.
Reported-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Fixes: 316964709e "util: add os_read_file() helper"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
dynamic textures seem to have predictable stride. This stride
should be the same as for a ram buffer.
It seems some game don't check the actual stride value, assuming
it to be the expected one.
Thus this workaround (protected by drirc option) is to use an intermediate
ram buffer.
Fixes Rayman Legends texture issues when enabled.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
One special case, `src/util/xmlpool/.gitignore` is not entirely deleted,
as `xmlpool.pot` still gets generated (eg. by `ninja xmlpool-pot`).
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Move the definition of radeonsi_clear_db_cache_before_clear there,
as well as radeonsi_enable_nir.
This removes the AMD_DEBUG=nir option.
We currently still have two places for options: the driconf machinery
and AMD_DEBUG/R600_DEBUG. If we are to have a single place for options,
then the driconf machinery should be preferred since it's more flexible.
The only downside of the driconf machinery was that adding new options
was quite inconvenient. With this change, a simple boolean option can
be added with a single line of code, same as for AMD_DEBUG.
One technical limitation of this particular implementation is that while
almost all driconf features are available, the translation machinery doesn't
pick up the description strings for options added in si_debvug_options. In
practice, translations haven't been provided anyway, and this is intended
for developer options, so I'm not too worried. It could always be added
later if anybody really cares.
v2:
- use bool instead of uint8_t for options
- si_debug_options.inc -> si_debug_options.h
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This suppresses warning about calling a non-virtual destructor in a
non-final class with virtual functions:
src/compiler/glsl/ast.h:53:4: warning: destructor called on non-final 'ast_node' that has virtual functions but non-virtual destructor [-Wdelete-non-virtual-dtor]
DECLARE_LINEAR_ZALLOC_CXX_OPERATORS(ast_node);
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Budgie Window Manager is an increasingly used alternative to GNOME and MATE.
Default in Solus OS, also used in other distros.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This is for the case that user only know a max size
it wants to append to the array and enlarge the array
capacity before writing into it.
v2:
- rename newsize to newcap
- rename util_dynarray_enlarge to util_dynarray_grow_cap
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We consider it acceptable, but let's still document it in case people
notice it and are not sure why it's there.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
__builtin_types_compatible_p() is GCC-specific and breaks the
MSVC build.
This intrinsic has been in u_vector_foreach() for a long time, but
that macro has only recently been used in code
(nir/nir_opt_comparison_pre.c) that's built with MSVC.
Fixes: 2cf59861a ("nir: Add partial redundancy elimination for compares")
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There are multiple `goto path_fail` with an open fd, but none that go to
`fail:` without going through `path_fail:` first, so let's just move the
`close(fd)` there.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In Android O, MESA needs to statically link libexpat so that
it's in same VNDK namespace.
v2: apply change also to anv driver (Tapani)
v3: use += in anv change (Eric Engestrom)
Change-Id: I82b0be5c817c21e734dfdf5bfb6a9aa1d414ab33
Signed-off-by: Kishore Kadiyala <kishore.kadiyala@intel.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
This function is replicated across vc4/v3d/freedreno and is needed in
Panfrost; let's make this shared code.
v2: Supply generic util_array_contains_u64 version (Eric Engestrom). Add
missing stdbool.h include (Eric Anholt). Mark inline (Christian
Gmeiner).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>