Commit Graph

57759 Commits

Author SHA1 Message Date
Ilia Mirkin dfb0ca1606 nv50/ir: fix phi/union sources when their def has been merged
In a situation where double-register values are used, the phi nodes can
still end up being u32 values. They all get merged into one RA node
though. When fixing up the merge (which comes after the phi node), the
phi node's def would get fixed, but not its sources which would remain
at the low register value.

This maintains the invariant that a phi node's defs and sources are
allocated the same register.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-07-24 08:26:41 -04:00
Ilia Mirkin 32702cceed nv50/ir: fix hard-coded TYPE_U32 sized register
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-07-24 08:26:41 -04:00
Ilia Mirkin 3f6b34bacc nvc0: mark shader header if fp64 is used
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-07-24 08:26:41 -04:00
Ilia Mirkin b21a28797c nv50/ir: keep track of whether the program uses fp64
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-07-24 08:26:41 -04:00
Ilia Mirkin 47e5a8d7a2 nvc0: make sure that the local memory allocation is aligned to 0x10
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2014-07-24 08:26:41 -04:00
Ilia Mirkin 637b6c2478 mesa: add ARB_clear_texture.xml to file list, remove duplicate decls
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2014-07-24 08:26:41 -04:00
Chia-I Wu 9d6166880d ilo: check the tilings of imported handles
Just to be cautious.
2014-07-24 13:38:51 +08:00
Chia-I Wu cbc943c43e ilo: clean up resource bo renaming
s/alloc_bo/rename_bo/ as that is what the functions do.  Simplify bo
allocation and move the complexity to bo renaming.
2014-07-24 13:21:35 +08:00
Chia-I Wu cf8c9947a8 ilo: share some code between {tex,buf}_create_bo
Add resource_get_bo_name() and resource_get_bo_initial_domain() for use by
both functions.
2014-07-24 10:49:02 +08:00
Chia-I Wu c1a1a627c4 ilo: use native 3-component vertex formats on GEN7.5+
GEN7.5 gains support for those formats natively.
2014-07-24 09:54:20 +08:00
Chia-I Wu 2126541b0b ilo: allow for device-dependent format translation
Pass ilo_dev_info to all format translation functions.
2014-07-24 09:33:33 +08:00
Jason Ekstrand 6bac86cd85 i965: Accelerate uploads of RGBA and BGRA GL_UNSIGNED_INT_8_8_8_8_REV textures
Since intel is always going to be little-endian,
GL_UNSIGNED_INT_8_8_8_8_REV is the same as GL_UNSIGNED_BYTE for RGBA and
BGRA textures, so the same acceleration code will work.  We might as well
use it.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-07-23 16:48:35 -07:00
Ian Romanick 5072d0e7fc mesa: Fix the name in the error message
Obvious copy-and-paste bug.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-07-23 16:42:47 -07:00
Ian Romanick 3f04a1532e glsl: Fix some bad indentation
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-07-23 16:42:47 -07:00
Kenneth Graunke d4d886a0bc i965/fs: Set LastRT on the final FB write on Broadwell.
In Piglit's EXT_framebuffer_multisample/alpha-to-coverage-dual-src-blend
test, key->nr_color_regions == 2, but the dual source blend FB write has
ir->target set to 0.  So we failed to set "Last Render Target Select" on
any FB write message.

We only emit one FB write per render target, so my comment about setting
LastRT on every FB write directed at the last color region is a bit...
misinformed.  According to the documentation, depth buffer writes and
scoreboard updates happen on the FB write with LastRT set, so I believe
we want to set it only once.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
2014-07-23 15:44:37 -07:00
Kenneth Graunke 36a4a6bbdc i965: Port INTEL_DEBUG=optimizer to the vec4 backend.
Largely via copy and paste.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-07-23 15:44:16 -07:00
Kenneth Graunke 8d2e95bd4b i965: Save the gl_shader_stage enum in backend_visitor.
This will be useful for INTEL_DEBUG=optimizer in the vec4 backend, which
needs to know whether it's currently processing a VS or GS.  It isn't
worth adding virtual methods for this case.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-07-23 15:44:14 -07:00
Kenneth Graunke d6d3e6027d i965: Don't print WE_normal in disassembly.
Dropping this helps most lines fit in an 80 column terminal.  The
absence of WE_normal also helps call attention to WE_all, where
something unusual is going on.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-07-23 15:44:08 -07:00
Rob Clark 2f181bc391 freedreno/a3xx/compiler: fix p0 (kill, etc)
Don't assert (debug builds) or assign random uninitialized value for
predicate register (p0).. that screws up kill, etc.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-07-23 15:10:53 -04:00
Tom Stellard fb237ba746 Revert "r600g/compute: Fix warnings"
This reverts commit 467f1585e2.

This breaks the build on some systems.
2014-07-23 11:52:05 -04:00
Grigori Goronzy 2a766b0b64 radeon/llvm: fix formatting
Use K&R and same indent as most other code. No functional change
intended.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-07-23 10:40:41 -04:00
Grigori Goronzy 0e9cdedd2e radeon/llvm: enable unsafe math for graphics shaders
Accuracy of some operations was recently improved in the R600 backend,
at the cost of slower code. This is required for compute shaders,
but not for graphics shaders. Add unsafe-fp-math hint to make LLVM
generate faster but possibly less accurate code.

Piglit didn't indicate any regressions.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-07-23 10:40:33 -04:00
Tom Stellard 467f1585e2 r600g/compute: Fix warnings 2014-07-23 10:29:17 -04:00
Glenn Kennard 2fa6d659c3 r600g: Use hardware sqrt instruction
Piglit quick tests including sqrt pass, no other regressions,
tested on radeon 6670.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2014-07-23 10:29:17 -04:00
Bruno Jiménez dbaf0bc388 r600g/compute: Remove unneeded code from compute_memory_promote_item
Now that we know that the pool is defragmented, we positively know
that allocated + unallocated will be the total size of the
current pool plus all the items that will be promoted. So we only
need to grow the pool once.

This will allow us to just add the new items to the end of the
item_list without the need of looking for a place to the new item.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-07-23 10:29:17 -04:00
Bruno Jiménez e7bda844e6 r600g/compute: Quick exit if there's nothing to add to the pool
This way we can avoid defragmenting the pool, even if it is needed
to defragment it, and looping again through the list of unallocated
items.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-07-23 10:29:17 -04:00
Bruno Jiménez 90d7b09ed2 r600g/compute: Defrag the pool if it's necesary
This patch adds a new member to the pool to track its status.
For now it is used only for the 'fragmented' status, but if
needed it could be used for more statuses.

The pool will be considered fragmented if: An item that isn't
the last is freed or demoted.

This 'strategy' has a problem, although it shouldn't cause any bug.
If for example we have two items, A and B. We choose to free A first,
now the pool will have the 'fragmented' status. If we now free B,
the pool will retain its 'fragmented' status even if it isn't
fragmented.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-07-23 10:29:17 -04:00
Bruno Jiménez d8b6f0dacb r600g/compute: Add a function for defragmenting the pool
This new function will move items forward in the pool, so that
there's no gap between them, effectively defragmenting the pool.

For now this function is a bit dumb as it just moves items
forward without trying to see if other items in the pool could
fit in the gaps.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-07-23 10:29:17 -04:00
Bruno Jiménez 1f705b2bee r600g/compute: Add a function for moving items in the pool
This function will be used in the future by compute_memory_defrag
to move items forward in the pool.

It does so by first checking for overlaping ranges, if the ranges
don't overlap it will copy the contents directly. If they overlap
it will try first to make a temporary buffer, if this buffer fails
to allocate, it will finally fall back to a mapping.

Note that it will only be needed to move items forward, it only
checks for overlapping ranges in that case. If needed, it can
easily be added by changing the first if.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-07-23 10:29:17 -04:00
Rob Clark 23ae2db854 freedreno/a3xx: more vtx formats
Actually what we currently handle is just the SCALED versions, and not
the int versions.  The difference probably matters more when we actually
support integer in the compiler.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-07-23 09:03:10 -04:00
Rob Clark a5ac36a75f freedreno/a3xx/compiler: const file relative addressing
Teach new compiler scheduling and register assignment how to deal with
relative addressing.  This gets us what we need to avoid falling back to
old compiler for CONST[ADDR[0].x+n].  It is also a prerequisite for temp
file relative addressing, although that is going to also need some
cleverness in register assignment to keep arrays grouped together.

NOTE: doing address calculation in full precision and then narrowing to
s16 in the mov to addr reg seems to sometimes cause lockups (and
sometimes work?!).  It seems more reliable to do the address calculation
in s16, like the blob does.  Which means teaching RA how to deal with
mixed half and full precision allocation.  Fortunately that didn't turn
out to be too hard, so that is a nice bonus which we could probably take
better advantage of elsewhere.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-07-23 09:03:10 -04:00
Rob Clark c18ae9c293 freedreno/a3xx/compiler: move function
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-07-23 09:03:09 -04:00
Rob Clark 3a7da7f5ec freedreno/a3xx: add back a few stalls
Technically we should not need these.  CP_LOAD_STATE can be pipelined.
But removing them broke a few piglit tests, like fbo-depth-
GL_DEPTH_COMPONENT24-readpixels.  I expect these are just masking a
problem elsewhere, or perhaps they are only needed under some more
specific circumstances.  But until that is understood properly, give
back a bit of the perf boost we got from c63450e8.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-07-23 09:03:09 -04:00
Rob Clark 9f6dfd16e3 targets/dri: fix freedreno targets
The kernel driver name is either "kgsl" (downstream/android) or "msm"
(upstream).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-07-23 09:03:09 -04:00
Rob Clark c357e8475a freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-07-23 09:03:09 -04:00
Neil Roberts 0779f37e15 meta: Add a meta implementation of GL_ARB_clear_texture
Adds an implementation of the ClearTexSubImage driver entry point that tries
to set up an FBO to render to the texture and then calls glClearBuffer with a
scissor to perform the actual clear. If an FBO can't be created for the
texture then it will fall back to using _mesa_store_ClearTexSubImage.

When used in combination with _mesa_store_ClearTexSubImage this should provide
an implementation that works for all DRI-based drivers. However as this has
only been tested with the i965 driver it is currently only enabled there.

v2: Only enable the extension for the i965 driver instead of all DRI drivers.
    Remove an unnecessary goto. Don't require GL_ARB_framebuffer_object. Add
    some more comments.

v3: Use glClearBuffer* to avoid having to modify glClearColor and friends.
    Handle sRGB textures. Explicitly disable dithering.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
2014-07-23 11:50:38 +01:00
Neil Roberts 05b52efbc9 meta: Add a state flag for the GL_DITHER
The Meta implementation of glClearTexSubImage is going to want to ensure that
dithering is disabled so that it can get a consistent color across the whole
texture when clearing. This adds a state flag to easily save it and set it to
the default value when performing meta operations.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-07-23 11:50:38 +01:00
Neil Roberts df9945ca26 texstore: Add a generic implementation of GL_ARB_clear_texture
Adds an implmentation of the ClearTexSubImage driver entry point that just
maps the texture and writes the values in. The extension is not yet enabled by
default because it doesn't work with multisample textures as they don't have a
simple linear layout.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2014-07-23 11:50:38 +01:00
Neil Roberts fbbbf7529c mesa/main: Add generic bits of ARB_clear_texture implementation
This adds the driver entry point for glClearTexSubImage and fills in the
_mesa_ClearTexImage and _mesa_ClearTexSubImage functions that call it.

v2: Don't clear some of the images if only one of them makes an error

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2014-07-23 11:50:38 +01:00
Neil Roberts 2e63f91e60 teximage: Add utility func for format/internalFormat compatibility check
In texture_error_check() there was a snippet of code to check whether the
given format and internal format are basically compatible. This has been split
out into its own static helper function so that it can be used by an
implementation of glClearTexImage too.
2014-07-23 11:50:38 +01:00
Ilia Mirkin c4067acd90 mesa/main: add ARB_clear_texture entrypoints
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
2014-07-23 11:50:37 +01:00
Michel Dänzer 07c65b85ea r600g/radeonsi: Use write-combined CPU mappings of some BOs in GTT
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-07-23 18:55:50 +09:00
Michel Dänzer 37d43ebb28 winsys/radeon: Use separate caching buffer managers for VRAM and GTT
Should reduce overhead because the caching buffer manager doesn't need to
consider buffers of the wrong type.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-07-23 15:43:04 +09:00
Anuj Phogat 9548ba6e7b mesa: Don't use memcpy() in _mesa_texstore() for float depth texture data
because float depth texture data needs clamping to [0.0, 1.0]. Let the
_mesa_texstore() fallback to slower path.

Fixes Khronos GLES3 CTS tests:
shadow_execution_vert
shadow_execution_frag

V2: Move the check to _mesa_texstore_can_use_memcpy() function.
    Add check for floating point data types.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-07-21 18:33:29 -07:00
Kenneth Graunke 29af97f280 i965/fs: Fix gl_SampleMask handling for SIMD16 on Gen8+.
We actually want to use mov(16), not mov(8).

Fixes 7 Piglit tests: ARB_sample_shading/builtin-gl-sample-mask [2468]
and ARB_sample_shading/builtin-gl-sample-mask-simple [468].

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80991
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
2014-07-21 14:59:13 -07:00
Kenneth Graunke 38ffef7840 i965/fs: Fix gl_SampleID for 2x MSAA and SIMD16 mode.
We might be able to do this without an extra program key field, but this
is non-invasive and fixes the bug, for now.

This fixes the following Piglit tests on Broadwell:
- ARB_sample_shading/builtin-gl-sample-id 2
- ARB_sample_shading/builtin-gl-sample-position 2
- EXT_framebuffer_multisample/multisample-blit 2 color
- EXT_framebuffer_multisample/multisample-blit 2 color linear
- EXT_framebuffer_multisample/multisample-blit 2 depth
- EXT_framebuffer_multisample/no-color 2 depth combined
- EXT_framebuffer_multisample/no-color 2 depth separate
- EXT_framebuffer_multisample/no-color 2 depth single
- EXT_framebuffer_multisample/no-color 2 depth-computed combined
- EXT_framebuffer_multisample/no-color 2 depth-computed separate
- EXT_framebuffer_multisample/no-color 2 depth-computed single
- EXT_framebuffer_multisample/unaligned-blit 2 color msaa
- EXT_framebuffer_multisample/unaligned-blit 2 depth msaa

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80991
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
2014-07-21 14:59:12 -07:00
Kenneth Graunke 4cf47c80fc i965: Add missing persample_shading field to brw_wm_debug_recompile.
Otherwise, the performance warning for shader recompiles will just say
"something else".

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2014-07-21 11:19:44 -07:00
Kenneth Graunke caf8c07dd4 i965/disasm: Don't disassemble the URB complete field on Broadwell.
It doesn't exist, so attempting to read it will trigger generation
assertions in the brw_inst API.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-07-21 11:19:17 -07:00
Kenneth Graunke 662f1ccc24 i965: Disable hex offset printing in disassembly.
Printing the hex offsets makes it basically impossible to diff assembly:
if you add even a single instruction, the entire shader shows up as a
difference.  So, every time I want to compare assembly, I have to strip
this out.

The hex offsets might be useful when debugging compaction, or when
inspecting the program cache buffer.  Since it's occasionally useful,
but uncommon, this patch disables it by default, but makes it easy to
re-enable it temporarily when the need arises.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-07-21 11:19:08 -07:00
Matt Turner 3e9105f7ee i965/vec4: Use foreach_inst_in_block a couple more places.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-07-21 10:35:41 -07:00