Commit Graph

103413 Commits

Author SHA1 Message Date
Jordan Justen
1c8a045bfb i965: Support saving the gen program with glGetProgramBinary
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-09 23:02:33 -07:00
Jordan Justen
eb5b4b0fd1 i965: Add flag_state param to brw_search_cache
This allows brw_search_cache to be used to find programs without
causing extra state to be emitted in the case where the program isn't
being made active. (For example, to find the program to save out with
the ARB_get_program_binary interface.)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-09 23:02:33 -07:00
Jordan Justen
48ce7745dc mesa: Add gl_shader_program param to ProgramBinarySerializeDriverBlob
This might be required because some stages might generate different
programs depending on the other stages in the program. For example,
the i965 driver's tessellation control stage depends on the
tessellation evaluation shader.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-09 23:02:33 -07:00
Jordan Justen
36dd15f8b3 i965: Add brw_populate_default_key
We will need to populate the default key for ARB_get_program_binary to
allow us to retrieve the default gen program to store in the program
binary.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-09 23:02:33 -07:00
Jordan Justen
65f2014740 i965: Replace brw_setup_tex_for_precompile brw with devinfo
Trying to make sure the setup of the default program key is not
dependent on the GL state.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-09 23:02:33 -07:00
Jordan Justen
e426286e21 i965: Regenerate blob without gen program for shader cache
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-09 23:02:33 -07:00
Jordan Justen
3a133223b3 compiler/blob: Add blob_skip_bytes
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-09 23:02:33 -07:00
Jordan Justen
8e7ee7433e i965: Add support for driver cache blob containing the gen program
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-09 23:02:33 -07:00
Jordan Justen
05bb4b4849 i965: Use brw_prog_key_set_id in disk cache load/store code
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-09 23:02:33 -07:00
Jordan Justen
170d76de9f i965: Add brw_prog_key_set_id helper to set the program id on any stage
For saving programs (shader cache; get program binary) it is useful to
set the id to 0, with the stage being a parameter.

For restoring programs it is useful to set the id to the id allocated
to the program at creation time.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-09 23:02:33 -07:00
Jordan Justen
1c1a7d11c8 i965: Add brw_stage_cache_id to map gl stages to brw cache_ids
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-09 23:02:32 -07:00
Jordan Justen
b9f9b35431 i965: Add brw_(read|write)_blob_program_data functions
We will want to use these for both the disk shader cache, and for the
ARB_get_program_binary.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-09 23:02:32 -07:00
Jordan Justen
1777c23abf i965: Add brw_program_deserialize_driver_blob
brw_program_deserialize_driver_blob will be a more generic form of
brw_program_deserialize_nir. In addition to nir, it will also be able
to extract gen binaries and upload them to the program cache.

In this commit, it continues to only support nir.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-09 23:02:32 -07:00
Jordan Justen
f4c154afc1 i965: Move brw_program_*serialize_nir to brw_program_binary.c
This will allow get_program_binary to add the gen program into its
serialization in addition to just the nir program.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-09 23:02:32 -07:00
Jordan Justen
cce3994dee mesa: Always call ProgramBinarySerializeDriverBlob
The driver may prefer to have a different blob for
ARB_get_program_binary compared to the version saved out for the disk
shader cache.

Since they both use the driver_cache_blob field, we need to always
give the driver the opportunity to fill in the driver_cache_blob when
saving the program binary.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-09 23:02:32 -07:00
Jordan Justen
6497be42b7 i965: Use ShaderCacheSerializeDriverBlob driver function
This function is called just before the gl_program::driver_cache_blob
is saved out as part of the gl_program serialization.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-09 23:02:32 -07:00
Jordan Justen
450f00e39d st/mesa: Use ShaderCacheSerializeDriverBlob driver function
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-09 23:02:32 -07:00
Jordan Justen
c510dd22a9 st/mesa: Skip serializing driver_cache_blob if it exists
Previously the mesa core code would not call to serialize the
driver_cache_blob if it existed. We will update it to always call to
serialize the driver_cache_blob meaning we should avoid re-serializing
it under mesa/state_tracker.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-09 23:02:32 -07:00
Jordan Justen
2a55553be3 mesa: Add disk shader cache driver blob callback
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-09 23:02:28 -07:00
Iago Toral Quiroga
213491600a intel/compiler: emit actual barriers for working-group level barriers
Until now we have assumed that we could skip emitting these barriers
in the general case based on empirical testing and a few assumptions
detailed in a comment in the driver code, however, recent CTS tests
have showed that we actually need them to produce correct behavior.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-10 07:46:34 +02:00
Dave Airlie
0cab6e51e3 radv: add some cxxflags for new c++ file
Looks like I broke intel CI compiles.

Fixes: 6f3aee40f9 (radv: using tls to store llvm related info and speed up compiles (v10))
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
2018-07-10 10:48:03 +10:00
Jason Ekstrand
dc1d10b396 anv,radv: Add support for VK_KHR_get_display_properties2
Reviewed-by: Keith Packard <keithp@keithp.com>
2018-07-09 17:09:41 -07:00
Jason Ekstrand
c0a27c5946 intel/aubinator_error_decode: Allow for more sections
Error states coming from actual Vulkan applications tend to have fairly
long command buffers and lots of chained batches.  30 total BOs isn't
nearly enough.  This commit bumps it to 256, makes some things use the
actual number of sections instead of the #define, and adds asserts if we
ever go over 256 sections.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-07-09 16:40:54 -07:00
Jason Ekstrand
5009e73bb1 intel/batch_decoder: Recurse for all 2nd level batches
Our attempt to restart the loop with the second level batch worked at
one point but got broken at some point.  It was too fragile anyway and
we're not likely to have enough secondaries to actually overflow the
stack so we may as well recurse in both cases.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-07-09 16:40:54 -07:00
Dave Airlie
45e25adfe8 virgl/vtest: add support to vtest for new cap getting.
The vtest protocol is pretty simple but also pretty dumb, and
the v1 caps query was fixed size, with no nice way to expand it,
however the server also ignores any command it doesn't understand.

So we can query v2 caps by sending a v2 followed by a v1, if the
v2 is ignored we know it's an old vtest server, and the we get
a v2 answer then we can just read the v1 answer and discard it.

Acked-by: Jakob Bornecrantz <jakob@collabora.com> (sounds good)
2018-07-10 09:07:37 +10:00
Anuj Phogat
2badf0e85b i965/icl: Don't set float blend optimization bit in CACHE_MODE_SS
CACHE_MODE_SS is not listed in gfxspecs table for user mode
non-privileged registers. So, making any changes from Mesa
will do nothing. Kernel is already setting this bit in
CACHE_MODE_SS register which is saved/restored to/from
the HW context image.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-07-09 15:38:42 -07:00
Anuj Phogat
c1d8300117 anv/icl: Don't set float blend optimization bit in CACHE_MODE_SS
CACHE_MODE_SS is not listed in gfxspecs table for user mode
non-privileged registers. So, making any changes from Mesa
will do nothing. Kernel is already setting this bit in
CACHE_MODE_SS register which is saved/restored to/from
the HW context image.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-07-09 15:38:42 -07:00
Jason Ekstrand
227dabc266 anv: Implement VK_EXT_vertex_attribute_divisor
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-09 15:37:51 -07:00
Jason Ekstrand
2caf6c0392 anv/pipeline: Add a per-VB instance divisor
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-09 15:37:51 -07:00
Jason Ekstrand
32f4feb5a0 anv/pipeline: Use a per-VB struct instead of separate arrays
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-09 15:37:51 -07:00
Jose Maria Casanova Crespo
6db20229ab anv: Enable SPV_KHR_8bit_storage and VK_KHR_8bit_storage
Enables SPV_KHR_8bit_storage and VK_KHR_8bit_storage on gen 8+
using the VK_KHR_get_physical_device_properties2 functionality
to expose if the extension is supported or not.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-10 00:14:50 +02:00
Jose Maria Casanova Crespo
0c01bf70e0 spirv/nir: Add support for SPV_KHR_8bit_storage
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-10 00:14:50 +02:00
Jose Maria Casanova Crespo
f29c19cd5c spirv: Include headers and grammar for SPV_KHR_8bit_storage
Updates headers and grammar to ff684ffc6a35d2a58f0f63108877d0064ea33feb

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-10 00:14:50 +02:00
Jose Maria Casanova Crespo
cd0afab99b i965/fs: Enable store_ssbo for 8-bit types.
v2: Update comment according to this patch. (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-10 00:14:50 +02:00
Jose Maria Casanova Crespo
11c904d0d3 intel/compiler: relax brw_eu_validate for byte raw movs
When the destination is a BYTE type allow raw movs
even if the stride is not exact multiple of destination
type and exec type, execution type is Word and its size is 2.

This restriction was only allowing stride==2 destinations
for 8-bit types.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-10 00:14:49 +02:00
Jose Maria Casanova Crespo
87fc9af3fc i965/fs: Enable conversions to 8-bit integers
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-10 00:14:49 +02:00
Jose Maria Casanova Crespo
030472c1f0 i965: Support for 8-bit base types in helper functions
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-10 00:14:49 +02:00
Jose Maria Casanova Crespo
232ed89802 i965/fs: Register allocator shoudn't use grf127 for sends dest
Since Gen8+ Intel PRM states that "r127 must not be used for return
address when there is a src and dest overlap in send instruction."

This patch implements this restriction creating new grf127_send_hack_node
at the register allocator. This node has a fixed assignation to grf127.

For vgrf that are used as destination of send messages we create node
interfereces with the grf127_send_hack_node. So the register allocator
will never assign to these vgrf a register that involves grf127.

If dispatch_width > 8 we don't create these interferences to the because
all instructions have node interferences between sources and destination.
That is enough to avoid the r127 restriction.

This fixes CTS tests that raised this issue as they were executed as SIMD8:

dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.8struct_to_32struct.storage_buffer_*int_geom

Shader-db results on Skylake:
   total instructions in shared programs: 7686798 -> 7686797 (<.01%)
   instructions in affected programs: 301 -> 300 (-0.33%)
   helped: 1
   HURT: 0

   total cycles in shared programs: 337092322 -> 337091919 (<.01%)
   cycles in affected programs: 22420415 -> 22420012 (<.01%)
   helped: 712
   HURT: 588

Shader-db results on Broadwell:

   total instructions in shared programs: 7658574 -> 7658625 (<.01%)
   instructions in affected programs: 19610 -> 19661 (0.26%)
   helped: 3
   HURT: 4

   total cycles in shared programs: 340694553 -> 340676378 (<.01%)
   cycles in affected programs: 24724915 -> 24706740 (-0.07%)
   helped: 998
   HURT: 916

   total spills in shared programs: 4300 -> 4311 (0.26%)
   spills in affected programs: 333 -> 344 (3.30%)
   helped: 1
   HURT: 3

   total fills in shared programs: 5370 -> 5378 (0.15%)
   fills in affected programs: 274 -> 282 (2.92%)
   helped: 1
   HURT: 3

v2: Avoid duplicating register classes without grf127. Let's use a node
    with a fixed assignation to grf127 and create interferences to send
    message vgrf destinations. (Eric Anholt)
v3: Update reference to CTS VK_KHR_8bit_storage failing tests.
    (Jose Maria Casanova)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
2018-07-10 00:14:49 +02:00
Jose Maria Casanova Crespo
0e47ecb29a intel/compiler: grf127 can not be dest when src and dest overlap in send
Implement at brw_eu_validate the restriction from Intel Broadwell PRM,
vol 07, section "Instruction Set Reference", subsection "EUISA
Instructions", Send Message (page 990):

"r127 must not be used for return address when there is a src and
dest overlap in send instruction."

v2: Style fixes (Matt Turner)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
2018-07-10 00:14:49 +02:00
Dave Airlie
6f3aee40f9 radv: using tls to store llvm related info and speed up compiles (v10)
This uses the common compiler passes abstraction to help radv
avoid fixed cost compiler overheads. This uses a linked list per
thread stored in thread local storage, with an entry in the list
for each target machine.

This should remove all the fixed overheads setup costs of creating
the pass manager each time.

This takes a demo app time to compile the radv meta shaders on nocache
and exit from 1.7s to 1s. It also has been reported to take the startup
time of uncached shaders on RoTR from 12m24s to 11m35s (Alex)

v2: fix llvm6 build, inline emit function, handle multiple targets
in one thread
v3: rebase and port onto new structure
v4: rename some vars (Bas)
v5: drag all code into radv for now, we can refactor it out later
for radeonsi if we make it shareable
v6: use a bit more C++ in the wrapper
v7: logic bugs fixed so it actually runs again.
v8: rebase on top of radeonsi changes.
v9: drop some C++ headers, cleanup list entry
v10: use pop_back (didn't have enough caffeine)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-10 07:58:03 +10:00
Adam Jackson
c1ec582059 swrast: Fix eglMakeCurrent(dpy, NULL, NULL, ctx) (v2)
Fixes 14 piglits, mostly in egl_khr_create_context.

v2: Also short-circuit the same-context-no-drawables case (Eric Anholt)

Fixes: https://github.com/anholt/libepoxy/issues/177
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2018-07-09 16:09:58 -04:00
Lionel Landwerlin
7205bdf41f intel: tools: dump_gpu: fix ppgtt mapping
We were not properly writing page tables when the virtual address
range spans multiple subtrees of the tables.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2018-07-09 21:08:08 +01:00
Eric Anholt
beeb94402f v3d: Implement noperspective varyings on V3D 4.x.
Fixes a bunch of piglit interpolation tests, and reduces my concern about
some MSAA blit shaders with noperspective varyings.
2018-07-09 11:48:32 -07:00
Eric Anholt
4b4795be9d v3d: Refactor flat shade/centroid flag emission.
The logic was duplicated in a pretty gross way, when what we really need
is just a helper function for stuffing the values in the packet.  This
will make implementing noperspective easier.
2018-07-09 11:48:32 -07:00
Eric Anholt
93f437d128 v3d: Fix typo in dither mode offset.
We weren't using the field yet, so it didn't affect anything.

Fixes: c0476d964a ("v3d: Express dithering mode in the same way that the CLIF parser does.")
2018-07-09 11:48:32 -07:00
zhaowei yuan
73ec437627 glsl: Treat sampler2DRect and sampler2DRectShadow as reserved in ES2
"sampler2DRect" and "sampler2DRectShadow" are specified as
reserved from GLSL 1.1 and GLSL ES 1.0

Signed-off-by: zhaowei yuan <zhaowei.yuan@samsung.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106906
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes: 34f7e761bc ("glsl/parser: Track built-in types using the glsl_type directly")
2018-07-09 11:37:08 -07:00
Charmaine Lee
097952abaa st/wgl: check for NULL piAttribList in wglCreatePbufferARB()
Java2d opengl pipeline passes NULL piAttribList to
wglCreatePbufferARB(). So skip parsing the attribute list
if it is NULL.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2018-07-06 17:32:49 -07:00
Jason Ekstrand
a695de5845 anv: Add support for VK_KHR_create_renderpass2
The implementation of CreateRenderPass2 uses the helpers we broke out in
previous commits.  The implementations of the new vkCmd functions just
call the old versions.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-07-09 10:11:53 -07:00
Jason Ekstrand
208be8eafa anv: Make subpass::depth_stencil_attachment a pointer
This makes certain checks a bit easier and means that we don't have
the attachment information duplicated in the attachment list and in
depth_stencil_attachment.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-07-09 10:11:53 -07:00
Jason Ekstrand
75e308fc44 anv/pass: Move implicit dependency setup to anv_render_pass_compile
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-07-09 10:11:53 -07:00