Commit Graph

67380 Commits

Author SHA1 Message Date
Jason Ekstrand d43e16b163 i965/fs: Use regs_read/written for post-RA scheduling in calculate_deps
Previously, we were assuming that everything read/wrote exactly 1 logical
GRF (1 in SIMD8 and 2 in SIMD16).  This isn't actually true.  In
particular, the PLN instruction reads 2 logical registers in one of the
components.  This commit changes post-RA scheduling to use regs_read and
regs_written instead so that we add enough dependencies.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92770
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-07 08:41:48 -08:00
Jason Ekstrand c839174d55 nir/validate: Add better validation of load/store types
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-07 08:41:35 -08:00
Marek Olšák d57ede92b7 radeonsi: add register definitions for Stoney
There are a few non-stoney changes too.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-07 10:22:13 +01:00
Marek Olšák 2658777f46 radeonsi: add workarounds for CP DMA to stay on the fast path
v2: set emit_scratch_reloc, add a NULL check

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-07 10:22:13 +01:00
Marek Olšák fc0416ef5d radeonsi: unify CP DMA preparation logic
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-07 10:22:13 +01:00
Marek Olšák 89da3b4458 radeonsi: unify CP DMA code determining various flags
v2: don't call get_flush_flags twice per function

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-07 10:22:12 +01:00
Marek Olšák c3e527f93d radeonsi: only enable write confirmation on the last CP DMA packet
This should improve performance for big copies that need to be split.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-07 10:22:12 +01:00
Ilia Mirkin 8e9ade7eb3 nv50/ir: allow emission of immediates in imul/imad ops
Nothing actually uses this yet (due to complications), but the emission
logic is right.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-07 00:42:15 -05:00
Ilia Mirkin 393d0c336b nv50/ir: properly set the type of the constant folding result
This removes the hack used for merge, which only covers a fraction of
the cases.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 19:39:32 -05:00
Ilia Mirkin 2f9aaed749 nv50/ir: add support for const-folding OP_CVT with F64 source/dest
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 19:39:32 -05:00
Ilia Mirkin 76957389fc nv50/ir: add fp64 opcode emission support for G200 (NVA0)
Need to emulate rcp/rsq before providing full fp64 support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 18:36:25 -05:00
Hans de Goede f979d3cfec nv50/ir: Add support for 64bit immediates to checkSwapSrc01
Now that we support 64 bit immediates in insnCanLoad, we need to swap
64 bit immediate sources too for optimal effect.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 18:13:31 -05:00
Hans de Goede 9f2f8bda6e nvc0/ir: Teach insnCanLoad about double immediates
Teach insnCanLoad about double immediates, together with the
"Add support for merge-s to the ConstantFolding pass"

This turns the following (nvc0) code:
  1: mov u32 $r2 0x00000000 (8)
  2: mov u32 $r3 0x3fe00000 (8)
  3: add f64 $r0d $r0d $r2d (8)

Into:
  1: add f64 $r0d $r0d 0.500000 (8)

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 18:13:31 -05:00
Hans de Goede 428506ece2 nv50/ir: Add support for merge-s to the ConstantFolding pass
This allows later passes like LoadPropagation to properly deal with 64
bit immediates.

If the new 64 bit load this introduces does not get optimized away then
split64BitOpPostRA() will split this into 2 instructions again.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 18:13:31 -05:00
Ilia Mirkin 2437f00853 nv50/ir: disallow 64-bit immediates on nv50 targets
No instructions are able to load short immediates like nvc0 can.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 18:13:31 -05:00
Ilia Mirkin 11e3dac36e nv50/ir: allow movs with TYPE_F64 destinations to be split
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 18:13:31 -05:00
Hans de Goede b487b55f7d gm107/ir: Add support for double immediates
Add support for encoding double immediates (up to 20 bits of precision)
into the generated gm107 machine-code.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 17:22:40 -05:00
Hans de Goede 12c850d01c nvc0/ir: Add support for double immediates
Add support for encoding double immediates (up to 20 bits of precision)
into the generated nvc0 machine-code.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 17:22:40 -05:00
Francisco Jerez 5169407221 i965/nir/fs: Add comment for no-op memory barrier functions
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-11-06 13:19:56 -08:00
Jordan Justen faa1193070 i965/nir/fs: Implement new barrier functions for compute shaders
For these nir intrinsics, we emit the same code as
nir_intrinsic_memory_barrier:

 * nir_intrinsic_memory_barrier_atomic_counter
 * nir_intrinsic_memory_barrier_buffer
 * nir_intrinsic_memory_barrier_image

We treat these nir intrinsics as no-ops:
 * nir_intrinsic_group_memory_barrier
 * nir_intrinsic_memory_barrier_shared

v3:
 * Add comment for no-op cases (curro)

v4:
 * Moving comment to a separate patch authored by curro

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-06 13:16:11 -08:00
Jordan Justen 9d65f3208b nir: Add new barrier functions for compute shaders
When these functions are called in glsl-ir, we create a corresponding
nir intrinsic function call.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-06 13:15:16 -08:00
Jordan Justen 91f188710a glsl: Add new barrier functions for compute shaders
When these functions are called in GLSL code, we create an intrinsic
function call:

 * groupMemoryBarrier => __intrinsic_group_memory_barrier
 * memoryBarrierAtomicCounter => __intrinsic_memory_barrier_atomic_counter
 * memoryBarrierBuffer => __intrinsic_memory_barrier_buffer
 * memoryBarrierImage => __intrinsic_memory_barrier_image
 * memoryBarrierShared => __intrinsic_memory_barrier_shared

v2:
 * Consolidate with memoryBarrier function/intrinsic creation (curro)

v3:
 * Instead of add_memory_barrier_function, add an intrinsic_name
   parameter to _memory_barrier (curro)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-06 13:14:44 -08:00
Boyuan Zhang 6bad554d98 radeon/uvd: fix VC-1 simple/main profile decode v2
We just needed to set the extra width/height fields to get this working.

v2 (chk): rebased, CC stable added, commit message added, fixed coding style

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-11-06 20:07:23 +01:00
Boyuan Zhang ed55def44f st/vaapi: fix vaapi VC-1 simple/main corruption v2
Apply the start code fix only to advanced profile.

v2 (chk): add commit message

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-11-06 20:07:23 +01:00
Julien Isorce cc1e5c972e st/va: add support for RGBX and BGRX in VPP
Before it was only possible to convert a NV12 surface to
RGBA or BGRA. This patch uses the same post processing
function, "handleVAProcPipelineParameterBufferType", but
add definitions for RGBX and BGRX.

This patch also makes vlVaQuerySurfaceAttributes more generic
to avoid copy and pasting the same lines.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-06 17:33:45 +00:00
Julien Isorce 42a5e143a8 vl/buffers: add RGBX and BGRX to the supported formats
Useful is one wants to create RGBX or BGRX surfaces.
The infrastructure is such that it required just a
few definitions to support these formats.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-06 17:33:38 +00:00
Julien Isorce bf6acbb2db st/va: properly use brackets in vlVaAcquireBufferHandle's switch
In "switch (mem_type)" the brackets were surrounding "case+default"
instead of "case" only.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-06 17:33:16 +00:00
Julien Isorce bfc245e9ac st/va: properly indent buffer.c, config.c, image.c and picture.c
Some lines were using 4 indentation spaces instead of 3.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-06 17:33:01 +00:00
Rob Clark 6459e780ae freedreno/a4xx: fix blend color
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-06 11:19:04 -05:00
Rob Clark 7465e16124 freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-06 11:18:47 -05:00
Guillaume Charifi 6f5e0c08a4 freedreno: add a305 support
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-06 11:17:58 -05:00
Boyan Ding 8f55ebe802 freedreno/ir3: Use nir_foreach_variable
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-06 11:17:53 -05:00
Rob Clark 99597d033a nir: some small cleanups
The various cf nodes all get allocated w/ shader as their ralloc_parent,
so lets make this more explicit.  Plus couple other corrections/
clarifications.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-06 11:15:41 -05:00
Ilia Mirkin d68226087c nvc0: reintroduce BGRA4 format support
Commit 342e68dc60 (nvc0: remove BGRA4 format support) removed the
support to fix a WoW trace. However after further experimentation, I was
able to get the blit to work by using a different "fake" format in the
2d engine.

The reason why this worked on nv50 is that nv50 falls back to the 3d
blit path in case either the src or the dst aren't "faithfully"
supported, while nvc0 only does it for the dst format. RG8 is better
supported by the nvc0 2d engine than R16.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 00:47:44 -05:00
Brian Paul 581111c4d6 mesa: report enum name in glClientActiveTexture() error string
As we do for glActiveTexture().  Trivial.
2015-11-05 20:12:33 -07:00
Julien Isorce 497bde6727 st/va: fix memory leak on error in vlVaCreateSurfaces2
Found by coverity: CID #1337953

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-05 23:39:45 +00:00
Julien Isorce e0b896c86c st/va: indent vlVaQuerySurfaceAttributes and vlVaCreateSurfaces2
Some lines were using 4 indentation spaces instead of 3.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-05 23:39:43 +00:00
Kenneth Graunke 8dcf807cb4 i965: Fix scalar VS float[] and vec2[] output arrays.
The scalar VS backend has never handled float[] and vec2[] outputs
correctly (my original code was broken).  Outputs need to be padded
out to vec4 slots.

In fs_visitor::nir_setup_outputs(), we tried to process each vec4 slot
by looping from 0 to ALIGN(type_size_scalar(type), 4) / 4.  However,
this is wrong: type_size_scalar() for a float[2] would return 2, or
for vec2[2] it would return 4.  This looked like a single slot, even
though in reality each array element would be stored in separate vec4
slots.

Because of this bug, outputs[] and output_components[] would not get
initialized for the second element's VARYING_SLOT, which meant
emit_urb_writes() would skip writing them.  Nothing used those values,
and dead code elimination threw a party.

To fix this, we introduce a new type_size_vec4_times_4() function which
pads array elements correctly, but still counts in scalar components,
generating correct indices in store_output intrinsics.

Normally, varying packing avoids this problem by turning varyings into
vec4s.  So this doesn't actually fix any Piglit or dEQP tests today.
However, if varying packing is disabled, things would be broken.
Tessellation shaders can't use varying packing, so this fixes various
tcs-input Piglit tests on a branch of mine.

v2: Shorten the implementation of type_size_4x to a single line (caught
    by Connor Abbott), and rename it to type_size_vec4_times_4()
    (renaming suggested by Jason Ekstrand).  Use type_size_vec4
    rather than using type_size_vec4_times_4 and then dividing by 4.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-05 15:26:07 -08:00
Roland Scheidegger 5ae37ae615 llvmpipe: disable texture cache
There are some weird problems with 8-wide vectors.
2015-11-05 18:00:42 +01:00
Ilia Mirkin ba093a099a nouveau: send back a debug message when waiting for a fence to complete
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-05 11:22:19 -05:00
Ilia Mirkin 4f6cd5fad0 nv50,nvc0: provide debug messages with shader compilation stats
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-05 11:22:19 -05:00
Ilia Mirkin 4335b28840 nouveau: add support for sending debug messages via KHR_debug
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-05 11:22:19 -05:00
Ilia Mirkin 6706cc1671 st/clover: provide a path for drivers to call through to pfn_notify
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

[ Francisco Jerez: Clean up clover::context interface by passing
  around a function object. ]
2015-11-05 11:22:19 -05:00
Ilia Mirkin c93c9d220b st/mesa: set debug callback for debug contexts
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-11-05 11:22:19 -05:00
Ilia Mirkin fc76cc05e3 gallium: expose a debug message callback settable by context owner
This will allow gallium drivers to send messages to KHR_debug endpoints

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-05 11:22:18 -05:00
Ilia Mirkin e587590a83 st/mesa: account for texture views when doing CopyImageSubData
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-05 11:22:18 -05:00
Iago Toral Quiroga eea3c907cc i965/fs: Do not mark used surfaces in FS_OPCODE_GET_BUFFER_SIZE
Do it in the visitor, like we do for other opcodes.

v2: use const, get rid of useless surf_index temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-05 16:11:52 +01:00
Iago Toral Quiroga eca4c43a33 i965/vec4: Do not mark used surfaces in VS_OPCODE_GET_BUFFER_SIZE
Do it in the visitor, like we do for other opcodes.

v2: use const, get rid of useless surf_index temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-05 16:11:52 +01:00
Iago Toral Quiroga 6105d1d0a0 i965/vec4: Do not mark used direct surfaces in VS_OPCODE_PULL_CONSTANT_LOAD
Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.

v2: Use const, do not add unnecessary temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-05 16:11:52 +01:00
Iago Toral Quiroga d7013988fb i965/fs: Do not mark used direct surfaces in UNIFORM_PULL_CONSTANT_LOAD
Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-05 16:11:52 +01:00