Commit Graph

74545 Commits

Author SHA1 Message Date
Julien Isorce 42a5e143a8 vl/buffers: add RGBX and BGRX to the supported formats
Useful is one wants to create RGBX or BGRX surfaces.
The infrastructure is such that it required just a
few definitions to support these formats.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-06 17:33:38 +00:00
Julien Isorce bf6acbb2db st/va: properly use brackets in vlVaAcquireBufferHandle's switch
In "switch (mem_type)" the brackets were surrounding "case+default"
instead of "case" only.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-06 17:33:16 +00:00
Julien Isorce bfc245e9ac st/va: properly indent buffer.c, config.c, image.c and picture.c
Some lines were using 4 indentation spaces instead of 3.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-06 17:33:01 +00:00
Rob Clark 6459e780ae freedreno/a4xx: fix blend color
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-06 11:19:04 -05:00
Rob Clark 7465e16124 freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-06 11:18:47 -05:00
Guillaume Charifi 6f5e0c08a4 freedreno: add a305 support
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-06 11:17:58 -05:00
Boyan Ding 8f55ebe802 freedreno/ir3: Use nir_foreach_variable
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-06 11:17:53 -05:00
Rob Clark 99597d033a nir: some small cleanups
The various cf nodes all get allocated w/ shader as their ralloc_parent,
so lets make this more explicit.  Plus couple other corrections/
clarifications.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-06 11:15:41 -05:00
Ilia Mirkin d68226087c nvc0: reintroduce BGRA4 format support
Commit 342e68dc60 (nvc0: remove BGRA4 format support) removed the
support to fix a WoW trace. However after further experimentation, I was
able to get the blit to work by using a different "fake" format in the
2d engine.

The reason why this worked on nv50 is that nv50 falls back to the 3d
blit path in case either the src or the dst aren't "faithfully"
supported, while nvc0 only does it for the dst format. RG8 is better
supported by the nvc0 2d engine than R16.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 00:47:44 -05:00
Brian Paul 581111c4d6 mesa: report enum name in glClientActiveTexture() error string
As we do for glActiveTexture().  Trivial.
2015-11-05 20:12:33 -07:00
Chad Versace 16119ad884 anv/meta: Finish load clears for stencil attachments
Tested by Crucible "func.depthstencil.stencil_triangles.*" in

  commit c194292d5eadb84e9d7489fc01ce0b653cdd4ca5 (HEAD -> master)
  Author: Chad Versace <chad.versace@intel.com>
  Date:   Wed Nov 4 16:19:24 2015 -0800
  Subject: func.depthstencil: Remove stencil clear workaround for Mesa
2015-11-05 15:45:43 -08:00
Julien Isorce 497bde6727 st/va: fix memory leak on error in vlVaCreateSurfaces2
Found by coverity: CID #1337953

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-05 23:39:45 +00:00
Julien Isorce e0b896c86c st/va: indent vlVaQuerySurfaceAttributes and vlVaCreateSurfaces2
Some lines were using 4 indentation spaces instead of 3.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-05 23:39:43 +00:00
Kenneth Graunke 8dcf807cb4 i965: Fix scalar VS float[] and vec2[] output arrays.
The scalar VS backend has never handled float[] and vec2[] outputs
correctly (my original code was broken).  Outputs need to be padded
out to vec4 slots.

In fs_visitor::nir_setup_outputs(), we tried to process each vec4 slot
by looping from 0 to ALIGN(type_size_scalar(type), 4) / 4.  However,
this is wrong: type_size_scalar() for a float[2] would return 2, or
for vec2[2] it would return 4.  This looked like a single slot, even
though in reality each array element would be stored in separate vec4
slots.

Because of this bug, outputs[] and output_components[] would not get
initialized for the second element's VARYING_SLOT, which meant
emit_urb_writes() would skip writing them.  Nothing used those values,
and dead code elimination threw a party.

To fix this, we introduce a new type_size_vec4_times_4() function which
pads array elements correctly, but still counts in scalar components,
generating correct indices in store_output intrinsics.

Normally, varying packing avoids this problem by turning varyings into
vec4s.  So this doesn't actually fix any Piglit or dEQP tests today.
However, if varying packing is disabled, things would be broken.
Tessellation shaders can't use varying packing, so this fixes various
tcs-input Piglit tests on a branch of mine.

v2: Shorten the implementation of type_size_4x to a single line (caught
    by Connor Abbott), and rename it to type_size_vec4_times_4()
    (renaming suggested by Jason Ekstrand).  Use type_size_vec4
    rather than using type_size_vec4_times_4 and then dividing by 4.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-05 15:26:07 -08:00
Roland Scheidegger 5ae37ae615 llvmpipe: disable texture cache
There are some weird problems with 8-wide vectors.
2015-11-05 18:00:42 +01:00
Ilia Mirkin ba093a099a nouveau: send back a debug message when waiting for a fence to complete
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-05 11:22:19 -05:00
Ilia Mirkin 4f6cd5fad0 nv50,nvc0: provide debug messages with shader compilation stats
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-05 11:22:19 -05:00
Ilia Mirkin 4335b28840 nouveau: add support for sending debug messages via KHR_debug
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-05 11:22:19 -05:00
Ilia Mirkin 6706cc1671 st/clover: provide a path for drivers to call through to pfn_notify
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

[ Francisco Jerez: Clean up clover::context interface by passing
  around a function object. ]
2015-11-05 11:22:19 -05:00
Ilia Mirkin c93c9d220b st/mesa: set debug callback for debug contexts
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-11-05 11:22:19 -05:00
Ilia Mirkin fc76cc05e3 gallium: expose a debug message callback settable by context owner
This will allow gallium drivers to send messages to KHR_debug endpoints

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-05 11:22:18 -05:00
Ilia Mirkin e587590a83 st/mesa: account for texture views when doing CopyImageSubData
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-05 11:22:18 -05:00
Iago Toral Quiroga eea3c907cc i965/fs: Do not mark used surfaces in FS_OPCODE_GET_BUFFER_SIZE
Do it in the visitor, like we do for other opcodes.

v2: use const, get rid of useless surf_index temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-05 16:11:52 +01:00
Iago Toral Quiroga eca4c43a33 i965/vec4: Do not mark used surfaces in VS_OPCODE_GET_BUFFER_SIZE
Do it in the visitor, like we do for other opcodes.

v2: use const, get rid of useless surf_index temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-05 16:11:52 +01:00
Iago Toral Quiroga 6105d1d0a0 i965/vec4: Do not mark used direct surfaces in VS_OPCODE_PULL_CONSTANT_LOAD
Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.

v2: Use const, do not add unnecessary temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-05 16:11:52 +01:00
Iago Toral Quiroga d7013988fb i965/fs: Do not mark used direct surfaces in UNIFORM_PULL_CONSTANT_LOAD
Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-05 16:11:52 +01:00
Iago Toral Quiroga 027b64a55a i965/fs: Do not mark direct used surfaces in VARYING_PULL_CONSTANT_LOAD
Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.

v2: Use const and remove useless surf_index temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-05 16:11:52 +01:00
Neil Roberts 6c5f371a27 i965/skl+: Enable support for 16x multisampling
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:21 +01:00
Neil Roberts aa3f9aaf31 mesa/meta: Use interpolateAtOffset for 16x MSAA copy blit
Previously there was a problem in i965 where if 16x MSAA is used then
some of the sample positions are exactly on the 0 x or y axis. When
the MSAA copy blit shader interpolates the texture coordinates at
these sample positions it was possible that it would jump to a
neighboring texel due to rounding errors. It is likely that these
positions would be used on 16x MSAA because that is where they are
defined to be in D3D.

To fix that this patch makes it use interpolateAtOffset in the blit
shader whenever 16x MSAA is used and the GL_ARB_gpu_shader5 extension
is available. This forces it to interpolate the texture coordinates at
the pixel center to avoid these problematic positions.

This fixes ext_framebuffer_multisample-unaligned-blit and
ext_framebuffer_multisample-clip-and-scissor-blit with 16x MSAA on
SKL+.

v2: Use interpolateAtOffset instead of interpolateAtSample
v3: Always try to enable GL_ARB_gpu_shader5 in the shader
    [Ian Romanick]

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-05 10:33:16 +01:00
Neil Roberts b080b3d54d meta/blit: Always try to enable GL_ARB_sample_shading
Previously this extension was only enabled when blitting between two
multisampled buffers. However I don't think it does any harm to just
enable it all the time. The ‘enable’ option is used instead of
‘require’ so that the shader will still compile if the extension isn't
available in the cases where it isn't used. This will make the next
patch simpler because it wants to add another optional extension.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-05 10:33:16 +01:00
Neil Roberts 2dd76ec16e meta: Support 16x MSAA in the multisample scaled blit shader
v2: Fix the x_scale in the shader. Remove the doubts in the commit
    message.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-05 10:33:16 +01:00
Neil Roberts 1a22b12fc5 i965/meta: Support 16x MSAA in the meta stencil blit
The destination rectangle is now drawn at 4x4 the size and the shader
code to calculate the sample number is adjusted accordingly.

Acked-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts a680465428 i965/fs/skl+: Fix calculating gl_SampleID for 16x MSAA
In order to accomodate 16x MSAA, the starting sample pair index is now
3 bits rather than 2 on SKL+.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-05 10:33:16 +01:00
Neil Roberts bf6bd7eaf0 i965: Support allocating the MCS buffer for 16x MSAA
When 16 samples are used the MCS buffer needs 64 bits per pixel.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts b4c2e6054f i965: Support calculating the bits needed to set up 16x MSAA
The gen7_surface_msaa_bits function already returns the right values
for 16 samples but it just needs its assert to be relaxed.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts 1a97cac767 i965/fs: Add a sampler program key for whether the texture is 16x MSAA
When 16x MSAA is used for sampling with texelFetch the compiler needs
to use a different instruction which passes more arguments for the MCS
data. Previously on skl+ it was unconditionally using this new
instruction. However since 16x MSAA is probably going to be pretty
rare, it is probably worthwhile to avoid using this instruction for
the other sample counts. In order to do that this patch adds a new
member to brw_sampler_prog_key_data to track when a sampler refers to
a buffer with 16 samples.

Note that this isn't done for the vec4 backend because it wouldn't
change how many registers it uses.

Acked-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts 4ef27745c8 i965/vec4/skl+: Use ld2dms_w instead of ld2dms
In order to support 16x MSAA, skl+ has a wider version of ld2dms that
takes two parameters for the MCS data. The MCS data in the response
still fits in a single register so we just need to ensure we copy both
values rather than just the lower one.

Acked-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts e386fb0dee i965/fs/skl+: Use ld2dms_w instead of ld2dms
In order to support 16x MSAA, skl+ has a wider version of ld2dms that
takes two parameters for the MCS data. The MCS data retrieved from the
ld_mcs instruction already returns 4 or 8 registers and is documented
to return zeroes for the mcsh value when the sample count is less than
16.

v2: Use get_lowered_simd_width to fall back to SIMD8 instructions when
    the message length would be too long in SIMD16.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts 20250e854e i965: Program 16x MSAA sample positions.
This is the standard pattern used by the other 3D graphics API.

BDW has slots for these values, but they aren't actually used until
SKL. Even though the documentation for BDW says they must be zero, it
doesn't seem to cause any harm to program them anyway.

The comment above for the 8x sample positions says that the hardware
implements centroid interpolation by picking the centre-most sample
that is inside the primitive. That implies that it might be worthwhile
to pick a pattern that includes 0.5,0.5. However by experimentation
this doesn't seem to actually be the case. With the sample positions
in this patch, if I modify the piglit test below so that it instead
reports the centroid position, it reports 0.492188,0.421875 which
doesn't match any of the positions. If I modify the sample positions
so that they include one at exactly 0.5,0.5 it doesn't help and it
reports another position which is even further from the center for
some reason.

arb_gpu_shader5-interpolateAtSample-different

Kenneth Graunke experimented with some other patterns that have a
higher standard deviation but I think after some discussion it was
decided that it would be better to pick the same pattern as the other
graphics API in case there are games that rely on this pattern.

(Based on a patch by Kenneth Graunke)

Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben at bwidawsk.net>
2015-11-05 10:33:15 +01:00
Kenneth Graunke 5048da974e i965: Handle 16x MSAA in IMS dimension munging code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:15 +01:00
Kenneth Graunke b9f8e729c8 nir: Rename nir_live_variables.c to nir_liveness.c.
It doesn't actually operate on variables.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-05 00:09:40 -08:00
Kenneth Graunke 5c6f21579d nir: Rename live_variables to live_ssa_defs.
This computes liveness of SSA values, not nir_variables.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-05 00:09:40 -08:00
Alejandro Piñeiro 56774e6302 i965/vec4: select predicate based on writemask for sel emissions
Equivalent to commit 8ac3b525c but with sel operations. In this case
we select the PredCtrl based on the writemask.

This patch helps on cases like this:
 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
 3: (+f0.0) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

In this case, cmod propagation can't optimize instruction #2, because
instructions #1 and #2 have different writemasks, and we can't update
directly instruction #2 writemask because our code thinks that sel at
instruction #3 reads all four channels of the flag, when it actually
only reads .x.

So, with this patch, the previous case becames this:
 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

Now only the x channel of the flag is used, allowing dead code
eliminate to update the writemask at the second instruction:
 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
 2: cmp.nz.f0.0 null.x:D, vgrf40.xxxx:D, 0D
 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

So now cmod propagation can simplify out #2:
 1: cmp.l.f0.0 vgrf40.0.x:F, attr18.wwww:F, vgrf7.xxxx:F
 2: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

Shader-db numbers:
total instructions in shared programs: 6235835 -> 6228008 (-0.13%)
instructions in affected programs:     219850 -> 212023 (-3.56%)
total loops in shared programs:        1979 -> 1979 (0.00%)
helped:                                1192
HURT:                                  0
2015-11-05 08:57:23 +01:00
Jason Ekstrand a40f682c71 anv/cmd_buffer: Fix SURFACE_STATE for non-view buffer bindings
We were treating it as if it's a BufferView and weren't taking the offset
into account properly.
2015-11-04 19:56:18 -08:00
Jason Ekstrand 1b68120760 anv/cmd_buffer: Don't use an anv_state pointer in emit_binding_table
The anv_state is supposed to be a flyweight so we're not really saving
anything by using a pointer.  Also, we were creating one, setting a pointer
to it, and then having it go out-of-scope which is bad.
2015-11-04 19:56:16 -08:00
Ilia Mirkin bb73fc4cb8 nouveau: relax fence emit space assert
We also have the "reserved for kick" space available. Some of my earlier
changes can probably be removed, but this is a quick fix for some of the
rarer fallout.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2015-11-04 22:43:56 -05:00
Chad Versace d259af3fbb anv: Remove unused anv_render_pass members
Remove members
  num_color_clear_attachments
  has_depth_clear_attachment
  has_stencil_clear_attachment

The new clear code in anv_meta_clear.c does not use them.
2015-11-04 15:54:38 -08:00
Chad Versace a9a3071fc4 anv/meta: Rewrite clear code
Fixes Crucible test "func.clear.load-clear.attachments-8".

The old clear code, when clearing attachments for
VK_ATTACHMENT_LOAD_OP_CLEAR, suffered from some fundamental bugs. The
bugs were not fixable with the old code's approach.

    - It assumed that a VkRenderPass contained at most one depthstencil
       attachment.

    - It tried to clear all attachments (color and the sole
      depthstencil) with a single instanced draw call, using the VUE
      header's RenderTargetArrayIndex to specify the instance's target
      color attachment. But the RenderTargetArrayIndex does not select
      entries in the binding table; it only selects an array index of
      a singled layered surface.

    - If at least one attachment of VkRenderPass had
      VK_ATTACHMENT_LOAD_OP_CLEAR,
      then the old code cleared *all* attachments. This was
      a consequence of using a single draw call and single pipeline for
      the clear.

The new clear code fixes those bugs by making a separate draw call for
each attachment, and using one pipeline when clearing color attachments
and a different pipeline for depth attachments.

The new code, like the old code, does not clear stencil attachments. It
is left as a FINISHME.
2015-11-04 15:20:52 -08:00
Chad Versace 49c96a14c5 anv/meta: Clear color attribute is always flat
No behavioral change. This patch just removes an unneeded function
parameter.
2015-11-04 15:15:19 -08:00
Chad Versace 7f82cc718f anv/meta: Use consistent naming for dynamic state mask
Consistently rename bitmasks of Vulkan dynamic state to 'dynamic_mask'.

  anv_meta_saved_state::dynamic_flags -> dynamic_mask
  anv_meta_save(dynamic_state)        -> dynamic_mask
2015-11-04 15:15:19 -08:00