In this case, we say an entrypoint is supported if ANY of the extensions
is supported. This is because, in the XML, entrypoints don't require
extensions so much as extensions require entrypoints.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The original string map assumed that the mapping from strings to
entrypoints was a bijection. This will not be true the moment we
add entrypoint aliasing. This reworks things to be an arbitrary map
from strings to non-negative signed integers. The old one also had a
potential bug if we ever had a hash collision because it didn't do the
strcmp inside the lookup loop. While we're at it, we break things out
into a helpful class.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Our previous handling of barriers always used the big hammer and didn't
correctly emit memory barriers when specified along with a control
barrier. This commit completely reworks the way we emit barriers to
make things both more precise and more correct.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
When looking up known glsl_type instances in the various hash tables, we
end up leaking the key instances used for the lookup, as the glsl_type
constructor allocates memory on the global mem_ctx. This patch changes
glsl_type to manage its own memory, which fixes the leak and also allows
getting rid of the global mem_ctx and its mutex.
v2: remove lambda usage (Tapani)
(+keep ASSERT_BITFIELD_SIZE, modify dummy ctor to initialize mem_ctx)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104884
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Simon Hausmann <simon.hausmann@qt.io>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This capability allows gl_ViewportIndex and gl_Layer to also be used
as outputs in Vertex and Tesselation shaders.
v2: Make conditional to the capability, add gl_Layer, add tesselation
shaders. (Iago)
v3: Don't export to tesselation control shader.
v4: Add Reviewd-by tag.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Fixes the following building errors:
external/mesa/src/intel/vulkan/anv_device.c:300: error: undefined reference to 'gen_get_pci_device_id_override'
external/mesa/src/intel/vulkan/anv_device.c:312: error: undefined reference to 'gen_get_device_name'
external/mesa/src/intel/vulkan/anv_device.c:313: error: undefined reference to 'gen_get_device_info'
clang.real: error: linker command failed with exit code 1 (use -v to see invocation)
Fixes: 272bef0601 "intel: Split gen_device_info out into libintel_dev"
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This reverts commit 2d36efdb7f.
This raised limit turns out to harmful for more complex shaders,
it causes excessive spilling in some Bioshock Infinite shaders.
The fps for the ssao demo on radv remains unchanged when reverting
this.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
If it's zero but put it in args we still end up consuming a
register for it.
This fixes some spilling in the NIR paths in Dirt Rally that
isn't seen with TGSI.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These transformations are inexact because section 4.7.1 (Range and
Precision) says:
Operations and built-in functions that operate on a NaN are not
required to return a NaN as the result.
The fmin or fmax might not return NaN in cases where the original
expression would be required to return NaN.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This transformation is inexact because section 4.7.1 (Range and
Precision) says:
Operations and built-in functions that operate on a NaN are not
required to return a NaN as the result.
The fmin or fmax might not return NaN in cases where the original
expression would be required to return NaN.
v2: Reorder operands and mark as inexact. The latter suggested by
Jason.
shader-db results:
Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14514817 -> 14514808 (<.01%)
instructions in affected programs: 229 -> 220 (-3.93%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 4
helped stats (rel) min: 2.86% max: 4.12% x̄: 3.70% x̃: 4.12%
total cycles in shared programs: 533145211 -> 533144939 (<.01%)
cycles in affected programs: 37268 -> 36996 (-0.73%)
helped: 8
HURT: 0
helped stats (abs) min: 2 max: 134 x̄: 34.00 x̃: 2
helped stats (rel) min: 0.02% max: 14.22% x̄: 3.53% x̃: 0.05%
Sandy Bridge and Ivy Bridge had similar results. (Ivy Bridge shown)
total cycles in shared programs: 257618409 -> 257618403 (<.01%)
cycles in affected programs: 12582 -> 12576 (-0.05%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.05% max: 0.05% x̄: 0.05% x̃: 0.05%
No changes on Iron Lake or GM45.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: Refactor out screen functions to st/omx
Allows to keep all the code under st/omx (st/omx/tizonia and
st/omx/bellagio).
Reverts targets/omx_bellagio to omx as additions to existing files
is enough to compile for both bellagio and tizonia.
* autotools changes:
--enable-omx -> --enable-omx-bellagio
* meson changes:
-Dgallium-omx=false -> -Dgallium-omx=disabled
-Dgallium-omx=true -> -Dgallium-omx=bellagio
Acked-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Julien Isorce <julien.isorce@gmail.com>
This allows us to generate, for example,
"exp param0 v0, off, off, off" if only the first channel is needed.
Not sure if this improves performance but it's worth trying.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
When the mask is not 0xf we need to update the number of
enabled channels, otherwise the hardware won't emit the
components that are combined.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Currently, it's always 0xf but an upcoming patch will reduce the
number of channels for parameters export.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This adds a missing library to the i965/Android.mk file, and updates
intel/Android.mk to include the new library. Without this, mesa does not
build on Android.
Fixes: 272bef0601 "intel: Split gen_device_info out into
libintel_dev"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On Android surface/swapchain extensions are implemented by the loader. Patch
modifies both anv and radv extension scripts disabling currently exposed
ones. See also earlier commit 9f763c1f9b.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Just like commit 2ffe395 does for radv.
Fixes following dEQP test on i965:
dEQP-VK.api.info.android.no_unknown_extensions
v2: make it !ANDROID since this extension is not about
surfaces/swapchain
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Some state trackers require 128.
(There are no plans to increase PIPE_MAX_SAMPLERS too, since with gl
state tracker it's unlikely more than 32 will be needed, if you need
more use bindless.)
The comment said it will only represent the lowest 32 regs. This was
not entirely true in practice, since at least on x86 you'll get
masked shifts (unless the compiler could recognize it already and toss
it out). It turns out this actually works out alright (presumably
noone uses it for temp regs) when increasing max sampler views, so
make that behavior explicit.
Albeit it feels a bit hacky (but in any case, explicit behavior there
is better than undefined behavior).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>