We were previously not doing at least some of the checks. This uses the
same logic that is used in glTexImage*.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Each BSD has slightly different sysctl for retrieving per-CPU times.
FreeBSD returns long while NetBSD returns uint64_t. On OpenBSD return
type differs between summation and per-CPU times. DragonFly is
compatible with FreeBSD.
Signed-off-by: Jan Beich <jbeich@FreeBSD.org>
Based on the vc4 implementation.
Fixes Android RenderEngine::flush() routine:
android.googlesource.com/platform/frameworks/native/+/refs/tags/android-o-mr1-iot-release-smart-clock-fcs/services/surfaceflinger/RenderEngine/RenderEngine.cpp#225
Signed-off-by: Roman Stratiienko <roman.stratiienko@globallogic.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Try more aggressive approach with cloning uniform and coord loads.
Uniform load can be inserted into any instruction, so let's do that. ARM site
claim that penalty for cache miss is one clock, so we don't lose anything if
we merge it into instruction that uses the result. As side effect we can also
pipeline it and thus decrease reg pressure.
Do the same for varyings that hold texture coords, but for different reason:
looks like there's a special path for coords that increases precision if
varying that holds it is pipelined. If we don't pipeline it and load coords
from a register its precision is fp16 and thus only 10 bits which is not enough
to accurately sample textures of size 1024 or larger.
Since instruction can hold only one uniform load and one varying load,
node_to_instr now creates a move using helper introduced in previous commit if
slot is already taken. As side effect of this change we can also try to
pipeline texture loads and create a move if attempt fails.
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
It can load value from varying directly as well. Also load_regs is the
only op that has a source, so add src_num field to load node and set it
accordingly.
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
When lowering from ubo, use the constant base field in the load_uniform
instruction for the constant part of the offset. Doesn't change much
for constant indexing, but this will help for indirect indexing because
constant-folding can't completely clean up the result.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This looks like clear copy-and-pasteos, and fixes:
dEQP-GLES2.functional.draw.random.40
(on A307 and A630, both tested in the new CI farm)
Reviewed-by: Rob Clark <robdclark@chromium.org>
We can get all the information we need from NIR. It's slightly less
accurate, but radeonsi doesn't use the extra information. The old code
also overcounted atomic counters, which led to problems when everything
was used at once.
Fixes KHR-GL45.compute_shader.resources-max.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Otherwise it's impossible to know the maximum SSBO index for both
internal TGSI shaders from TTN (which don't have any notion of atomic
counters and no offset) as well as shaders from GLSL.
I fixed everything I could find while grepping for num_ssbos and
num_abos, which hopefully is everything (iris was the only user I could
find that uses it in a meaningful way).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This adds a bit of unneccesary code on radeonsi, since whether
unnormalized coordinates are used is known at compile time with GL, but
I wasn't sure if it was worth the few instructions to plumb everything
through, especially for something so rare -- my shader-db doesn't have
any instances where this changes anything.
Fixes CTS tests I created at
https://github.com/cwabbott0/VK-GL-CTS/tree/unnorm-gather-tests
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The workaround was originally written based on amdgpu-pro traces, but
since then radeonsi has got its own slightly different version. Use the
radeonsi version instead, to be consistent and because it'll be slightly
more convenient for handling unnormalized coordinates.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Technically, the user might have set EGL_DISPLAY instead of
EGL_PLATFORM, but since the former is deprecated let's just mention the
latter in the warning message.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This routine was made obsolete over a series of reworks of memory
allocation; Tomeu's changes to shader memory allocation finally made
this unused as cppcheck noted.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
swr_shader.cpp: In function ‘void (* swr_compile_gs(swr_context*, swr_jit_gs_key&))(HANDLE, HANDLE, SWR_GS_CONTEXT*)’:
swr_shader.cpp:732:44: error: ‘make_unique’ was not declared in this scope
ctx->gs->map.insert(std::make_pair(key, make_unique<VariantGS>(builder.gallivm, func)));
^~~~~~~~~~~
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
While the documentation for _BitScanReverse64 on MSDN says that it's
available on ARM, this isn't true. It's only available on ARM64. So
let's match reality.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
This code generates CVTSD2SI, which requires SSE2. So let's fix the
required SSE-version.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 5de29ae (util: try to use SSE instructions with MSVC and 32-bit gcc)
Reviewed-by: Matt Turner <mattst88@gmail.com>
This has been unused since 183db3a645 ("glsl: move half<->float
convertion to util"), Oct 10 2015. Let's drop needlessly including it.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>