Marek Olšák
2891c4b2e2
st/mesa: save currently bound vertex samplers and sampler views in st_context
...
for st_draw_feedback.c
Reviewed-by: Dave Airlie <airlied@redhat.com >
2019-12-09 21:09:28 -05:00
Marek Olšák
226e7aee70
st/mesa: support UBOs for Selection/Feedback/RasterPos
...
Reviewed-by: Dave Airlie <airlied@redhat.com >
2019-12-09 21:09:28 -05:00
Marek Olšák
60db75cb77
gallivm: implement LOAD with CONSTBUF but don't enable it for llvmpipe
...
This is already used in st_draw_feedback.c, because it uses shaders
generated for drivers.
Reviewed-by: Roland Scheidegger <sroland@vmware.com >
2019-12-09 21:09:28 -05:00
Marek Olšák
525c8b90c7
llvmpipe: implement TEX_LZ and TXF_LZ opcodes
...
gallivm receives these opcodes anyway because st_draw_feedback.c uses
shaders that were assembled for drivers, not llvmpipe.
Reviewed-by: Roland Scheidegger <sroland@vmware.com >
2019-12-09 21:09:28 -05:00
Gurchetan Singh
3c8ddc8f4b
drirc: set allow_higher_compat_version for Faster Than Light
...
With 781a78 ("mesa: enable ARB_direct_state_access in compat for
GL3.1+), it's possible to have DSA with GL3.1+.
FTL creates a GL3.1 compat context, but fails the
_mesa_has_geometry_shaders(..) check in frame_buffer_texture.
Bump the compat version to pass the check.
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2019-12-09 15:27:02 -08:00
Roland Scheidegger
23f1b78e8f
util/atomic: Fix p_atomic_add for unlocked and msvc paths
...
Braces mismatch (flagged by CI, untested).
Fixes: 385d13f26d "util/atomic: Add a _return variant of p_atomic_add"
Reviewed-by: Brian Paul <brianp@vmware.com >
Reviewed-by: Jose Fonseca <jfonseca@vmware.com >
Reviewed-by: Dylan Baker <dylan@pnwbakers.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2019-12-09 15:02:58 -08:00
Eric Anholt
0470a03769
freedreno: Track the set of UBOs to be uploaded in UBO analysis.
...
We were iterating over the entire 32-entry array each time, when we
can just use a bitset to know that we're only uploading from the first
entry normally.
Knocks ir3_emit_user_consts down from ~.5% of CPU to .1% on WebGL
fishtank.
Reviewed-by: Rob Clark <robdclark@chromium.org >
2019-12-09 14:13:50 -08:00
Eric Anholt
10da0a9d18
freedreno: Stop forcing ALLOW_MAPPED_BUFFERS_DURING_EXEC off.
...
The default is to not throw GL errors when drawing with mapped
buffers, but we were forcing it on for unclear reasons. Internally we
keep all our buffers mapped anyway, so it should be a no-op other than
reducing CPU overhead (.23% in a perf report for WebGL fishtank)
Reviewed-by: Rob Clark <robdclark@chromium.org >
2019-12-09 14:13:47 -08:00
Rob Clark
dc791d3c68
freedreno/fdperf: use drmOpen()
...
Signed-off-by: Rob Clark <robdclark@chromium.org >
2019-12-09 13:09:58 -08:00
Alyssa Rosenzweig
a37822f5f7
gallium/util: Support POLYGON in u_stream_outputs_for_vertices
...
u_decomposed_prims_for_vertices cannot support POLYGON, but POLYGON is
trivial to support as a special case directly (since we have the number
of vertices directly).
Fixes aborts in Panfrost in apps using GL_POLYGON.
Fixes: e881aa8c12 ("gallium/util: Add u_stream_outputs_for_vertices helper")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com >
Revewied-by: Eric Anholt <eric@anholt.net >
2019-12-09 21:09:05 +00:00
Anuj Phogat
1a32fbd48c
intel: Add pci-ids for Jasper Lake
...
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com >
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
2019-12-09 12:22:57 -08:00
Anuj Phogat
11fdd5f52c
intel: Add device info for 1x4x6 Jasper Lake
...
Also removing the FIXME comments after matching the numbers with
updated documentation.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com >
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
2019-12-09 12:22:56 -08:00
Vasily Khoruzhick
9f5fa496cb
lima: expose tiled format modifier in query_dmabuf_modifiers()
...
Fixes: 8c12f4e5f2 ("lima: enable tiling")
Reviewed-by: Qiang Yu <yuq825@gmail.com >
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com >
2019-12-09 15:21:55 +00:00
Vasily Khoruzhick
01a451b04d
lima: handle DRM_FORMAT_MOD_INVALID in resource_from_handle()
...
Assume that resource is tiled if we get DRM_FORMAT_MOD_INVALID
in resource_from_handle() and we don't have RO.
Fixes: 8c12f4e5f2 ("lima: enable tiling")
Reviewed-by: Qiang Yu <yuq825@gmail.com >
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com >
2019-12-09 15:21:55 +00:00
Jonathan Marek
9d78cf4584
turnip: add hw binning
...
Signed-off-by: Jonathan Marek <jonathan@marek.ca >
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com >
Reviewed-by: Eric Anholt <eric@anholt.net >
2019-12-09 08:22:18 -05:00
Samuel Pitoiset
86dfe92bd0
radv: do not use VK_TRUE/VK_FALSE
...
For consistency.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-12-09 09:21:26 +01:00
Dave Airlie
d7dc14628a
gallivm: add bitfield reverse and ufind_msb
...
Reviewed-by: Roland Scheidegger <sroland@vmware.com >
Reviewed-by: Krzysztof Raszkowski <krzysztof.raszkowski@intel.com >
2019-12-09 06:05:02 +10:00
Roland Scheidegger
1c7693e3bd
gallium/scons: fix graw_gdi build
...
Fixes: 44a6b0107b (gallivm: add nir->llvm translation (v2))
Reviewed-by: Dave Airlie <Airlied@redhat.com >
Reviewed-by: Jose Fonseca <jfonseca@vmware.com >
2019-12-07 17:50:53 +01:00
Daniel Schürmann
8259c97b2d
aco: propagate temporaries into expanded vectors
...
Gives a very slight decrease in code size:
Totals from affected shaders:
Code Size: 1708488 -> 1702768 (-0.33 %) bytes
Max Waves: 2858 -> 2855 (-0.10 %)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
df3e674fb3
aco: improve readfirstlane after uniform ssbo loads on GFX7
...
pipeline-db changes for GFX7:
80310 shaders in 40472 tests
Totals:
SGPRS: 3655900 -> 3643916 (-0.33 %)
VGPRS: 2678324 -> 2686324 (0.30 %)
Spilled SGPRs: 1730 -> 1634 (-5.55 %)
Spilled VGPRs: 14 -> 21 (50.00 %)
Scratch size: 15540 -> 15536 (-0.03 %) dwords per thread
Code Size: 136106120 -> 135457616 (-0.48 %) bytes
LDS: 1259 -> 1259 (0.00 %) blocks
Max Waves: 601014 -> 600206 (-0.13 %)
Totals from affected shaders:
SGPRS: 307832 -> 295848 (-3.89 %)
VGPRS: 267864 -> 275864 (2.99 %)
Spilled SGPRs: 770 -> 674 (-12.47 %)
Spilled VGPRs: 14 -> 21 (50.00 %)
Scratch size: 16 -> 12 (-25.00 %) dwords per thread
Code Size: 22007488 -> 21358984 (-2.95 %) bytes
LDS: 65 -> 65 (0.00 %) blocks
Max Waves: 28668 -> 27860 (-2.82 %)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
0837471463
aco: use soffset for MUBUF instructions on SI/CI
...
pipeline-db changes for GFX7:
80310 shaders in 40472 tests
Totals:
SGPRS: 3655300 -> 3655900 (0.02 %)
VGPRS: 2677732 -> 2678324 (0.02 %)
Spilled SGPRs: 1730 -> 1730 (0.00 %)
Spilled VGPRs: 14 -> 14 (0.00 %)
Scratch size: 15540 -> 15540 (0.00 %) dwords per thread
Code Size: 136488364 -> 136106120 (-0.28 %) bytes
LDS: 1259 -> 1259 (0.00 %) blocks
Max Waves: 601039 -> 601014 (-0.00 %)
Totals from affected shaders:
SGPRS: 316312 -> 316912 (0.19 %)
VGPRS: 273844 -> 274436 (0.22 %)
Spilled SGPRs: 770 -> 770 (0.00 %)
Spilled VGPRs: 14 -> 14 (0.00 %)
Scratch size: 16 -> 16 (0.00 %) dwords per thread
Code Size: 22724904 -> 22342660 (-1.68 %) bytes
LDS: 114 -> 114 (0.00 %) blocks
Max Waves: 30861 -> 30836 (-0.08 %)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
7b38d95b32
radv: Enable ACO on GFX7 (Sea Islands)
...
This patch also disables AMD_shader_ballot on GFX7 by default if ACO is used.
Note that shader_ballot works correctly, but performance seems inferior.
To enable shader_ballot use RADV_PERFTEST=shader_ballot.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
28c95cc402
aco: return to loop_active mask at continue_or_break blocks
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
0f9447ccb0
radv: disable Youngblood app profile if ACO is used
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
746165e540
aco: implement exclusive scan for SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
7ae227effd
aco: implement inclusive_scan for SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
f895a8b1df
aco: implement (clustered) reductions for SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
9254fb4fc7
aco: don't use a scalar temporary for reductions on GFX10
...
This patch also adds the scalar temporary for scans on SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
8ad43d8838
aco: flush denorms after fmin/fmax on pre-GFX9
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
21f67a3bdc
radv: only flush scalar cache for SSBO writes with ACO on GFX8+
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
79ce6c1b33
aco: disable disassembly for SI/CI due to lack of support by LLVM
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
1c4afe38f2
aco: implement 64bit ine/ieq for SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
1e1356b2ad
aco: implement 64bit i2b for SI /CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
da7ff58835
aco: make 1/2*PI a literal constant on SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
90fad7360d
aco: implement 64bit VGPR shifts for SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
6a586a6006
aco: split read/writelane opcode into VOP2/VOP3 version for SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
23319add93
aco: fix disassembly of writelane instructions.
...
ACO writes an unused 3rd operand for internal usage
which makes LLVM recoginize it as illegal instruction.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
6fc9ddfef8
aco: recognize SI/CI SMRD hazards
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
3eed4d2be5
aco: implement quad swizzles for SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
bde9c1e3a1
aco: move buffer_store data to VGPR if needed
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
a8195bdf2e
aco: implement nir_op_isign on SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
b8783973cd
aco: only use scalar loads for readonly buffers on SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
f27783a667
aco: implement nir_op_fquantize2f16 for SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
caea4bbfdc
aco: fix SMEM offsets for SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
8aab92b393
aco: SI/CI - fix sampler aniso
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Dave Airlie
9b533a2ca3
aco: handle gfx7 int8/10 clamping on exports
...
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
0d42e4d7a0
aco: Initial GFX7 Support
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
3177346bfc
aco: refactor visit_store_fs_output() to use the Builder
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Jason Ekstrand
0f60aa4037
anv: Re-emit all compute state on pipeline switch
...
It's a very odd case to hit in the real world. However, there are some
CTS tests which switch back and forth between dispatch and clear without
changing the pipeline.
Fixes: bc612536eb "anv: Emit a dummy MEDIA_VFE_STATE before switching..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com >
2019-12-07 04:03:35 +00:00
Jason Ekstrand
bce1c3c668
anv: Re-capture all batch and state buffers
...
When we moved from allocating BOs directly to using the BO cache, we
lost the EXEC_OBJECT_CAPTURE flag on all our state buffers.
Fixes: 3119b96bdf "anv: Allocate block pool BOs from the cache"
Fixes: ee77938733 "anv: Allocate batch and fence buffers from..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com >
2019-12-07 04:03:35 +00:00