Commit Graph

102232 Commits

Author SHA1 Message Date
Ian Romanick
1f1007a4ed nir: Initialize lower_flrp_progress everywhere
I don't know why I thought NIR_PASS always set the progress variable.
Derp.

Fixes: d41cdef2a5 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic")
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Coverity CID: 1444996
Coverity CID: 1444995
Coverity CID: 1444994
Coverity CID: 1444993
Coverity CID: 1444991
Coverity CID: 1444989
2019-05-09 10:03:51 -07:00
Eric Engestrom
8b3baa2744 gallium: fix typo in comment
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-09 11:14:37 +01:00
Eric Engestrom
6c6af0c8b0 i965_asm: avoid free()ing uninitialized pointers
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-09 10:03:15 +00:00
Eric Engestrom
51597eca84 i965_asm: fix memleak
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-09 10:03:15 +00:00
Samuel Pitoiset
53dfff1c4d radv: fix setting the number of rectangles when it's dyanmic
We need to know the number of rectangles.

This fixes new CTS dEQP-VK.draw.discard_rectangles.dynamic_*.

Fixes: 5db0bf9994 ("radv: Implement VK_EXT_discard_rectangles.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-09 11:42:25 +02:00
Chris Wilson
8b81256469 iris: Reorganise execbuf to have a single point of failure
Propagate the failure from GEM_EXECBUFFER2, cleanup then report failure
if need be. We retain the current behaviour to abort() at the first sign
of trouble -- for a non-robustness context, arguably this is the right
thing to do as the client cannot recover, and the system state is lost.
How to properly integrate with KHR_robustness and reset-strategy is
left as a future exercise.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-08 17:21:07 -07:00
Dave Airlie
0a42d5b98b kmsro: add _dri.so to two of the kmsro drivers.
Fixes: 8cfc17bdda (kmsro: Add the rest of the current set of tinydrm drivers.)

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-09 07:15:26 +10:00
Kenneth Graunke
d9b9bb91ff iris: Report the same video memory settings as i965.
This just copy and pastes Ian's code from i965.
2019-05-08 12:43:08 -07:00
Brian Paul
a17c1ae165 gallium/util: fix two MSVC compiler warnings
Remove stray const qualifier.
s/unsigned/enum tgsi_semantic/

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-05-08 10:05:42 -06:00
Brian Paul
4f54e550e9 gallium/pp: s/uint/enum tgsi_semantic/ to fix MSVC warning
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-05-08 10:05:42 -06:00
Brian Paul
cf5c7beb63 noop: s/enum pipe_transfer_usage/unsigned/ to fix MSVC warning
The function pointer declaration in pipe_context uses unsigned
for the bitmask.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-05-08 10:05:41 -06:00
Brian Paul
bc517dbbf7 ddebug: fix a few MSVC compiler warnings
Don't return an expression in void functions.
Replace an unsigned int with proper enum.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-05-08 10:05:41 -06:00
Brian Paul
2e28983ed2 glsl: s/GLboolean/bool/ to silence MSVC compiler warning
It complains about mixing GLboolean and bool in the |= expression.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-05-08 10:05:41 -06:00
Ian Romanick
ed5f024515 nir/flrp: Reassociate add in flrp(±1, b, c) lowering path
With this reassociation, this lowering path is still beneficial.

Ice Lake
total instructions in shared programs: 17220191 -> 17207181 (-0.08%)
instructions in affected programs: 999871 -> 986861 (-1.30%)
helped: 3703
HURT: 17
helped stats (abs) min: 1 max: 686 x̄: 3.52 x̃: 3
helped stats (rel) min: 0.09% max: 51.97% x̄: 2.21% x̃: 1.35%
HURT stats (abs)   min: 1 max: 9 x̄: 1.47 x̃: 1
HURT stats (rel)   min: 0.08% max: 4.55% x̄: 0.78% x̃: 0.55%
95% mean confidence interval for instructions value: -4.01 -2.99
95% mean confidence interval for instructions %-change: -2.29% -2.11%
Instructions are helped.

total cycles in shared programs: 360871298 -> 360755040 (-0.03%)
cycles in affected programs: 9931334 -> 9815076 (-1.17%)
helped: 2388
HURT: 1569
helped stats (abs) min: 1 max: 10228 x̄: 93.54 x̃: 18
helped stats (rel) min: <.01% max: 74.11% x̄: 3.36% x̃: 1.07%
HURT stats (abs)   min: 1 max: 1917 x̄: 68.27 x̃: 22
HURT stats (rel)   min: <.01% max: 44.90% x̄: 3.44% x̃: 1.72%
95% mean confidence interval for cycles value: -39.48 -19.28
95% mean confidence interval for cycles %-change: -0.86% -0.46%
Cycles are helped.

total spills in shared programs: 12355 -> 12159 (-1.59%)
spills in affected programs: 295 -> 99 (-66.44%)
helped: 2
HURT: 1

total fills in shared programs: 25398 -> 25207 (-0.75%)
fills in affected programs: 288 -> 97 (-66.32%)
helped: 2
HURT: 1

LOST:   3
GAINED: 44

Iron Lake
total instructions in shared programs: 8169225 -> 8159729 (-0.12%)
instructions in affected programs: 1025712 -> 1016216 (-0.93%)
helped: 3352
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 2.83 x̃: 3
helped stats (rel) min: 0.15% max: 12.00% x̄: 1.51% x̃: 1.05%
95% mean confidence interval for instructions value: -2.86 -2.80
95% mean confidence interval for instructions %-change: -1.56% -1.46%
Instructions are helped.

total cycles in shared programs: 188656796 -> 188612280 (-0.02%)
cycles in affected programs: 18633584 -> 18589068 (-0.24%)
helped: 3085
HURT: 14
helped stats (abs) min: 2 max: 72 x̄: 14.45 x̃: 12
helped stats (rel) min: 0.02% max: 5.73% x̄: 0.73% x̃: 0.31%
HURT stats (abs)   min: 2 max: 4 x̄: 3.71 x̃: 4
HURT stats (rel)   min: <.01% max: <.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for cycles value: -14.55 -14.18
95% mean confidence interval for cycles %-change: -0.76% -0.69%
Cycles are helped.

GM45
total instructions in shared programs: 5026905 -> 5021856 (-0.10%)
instructions in affected programs: 584169 -> 579120 (-0.86%)
helped: 1776
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 2.84 x̃: 3
helped stats (rel) min: 0.15% max: 11.11% x̄: 1.43% x̃: 0.98%
95% mean confidence interval for instructions value: -2.88 -2.80
95% mean confidence interval for instructions %-change: -1.50% -1.37%
Instructions are helped.

total cycles in shared programs: 129047376 -> 129018918 (-0.02%)
cycles in affected programs: 12941924 -> 12913466 (-0.22%)
helped: 1722
HURT: 14
helped stats (abs) min: 4 max: 72 x̄: 16.56 x̃: 18
helped stats (rel) min: 0.02% max: 5.73% x̄: 0.72% x̃: 0.30%
HURT stats (abs)   min: 2 max: 4 x̄: 3.71 x̃: 4
HURT stats (rel)   min: <.01% max: <.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for cycles value: -16.65 -16.13
95% mean confidence interval for cycles %-change: -0.76% -0.66%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-05-08 07:41:54 -07:00
Ian Romanick
ba203a3cd7 nir/flrp: Fix typo on the flrp(±1, b, c) path
After Samuel reported the bisect, I was able to find the bug by
inspection.  Good thing for well-named varibles. :)

Unfortunately, this undoes almost all of the benefit of the original
patch.

Ice Lake
total instructions in shared programs: 17183159 -> 17218166 (0.20%)
instructions in affected programs: 1308722 -> 1343729 (2.67%)
helped: 98
HURT: 4746
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.47% max: 2.70% x̄: 0.60% x̃: 0.57%
HURT stats (abs)   min: 1 max: 691 x̄: 7.40 x̃: 8
HURT stats (rel)   min: 0.10% max: 700.00% x̄: 5.82% x̃: 2.83%
95% mean confidence interval for instructions value: 6.82 7.64
95% mean confidence interval for instructions %-change: 5.22% 6.15%
Instructions are HURT.

total cycles in shared programs: 360705959 -> 360853522 (0.04%)
cycles in affected programs: 10754380 -> 10901943 (1.37%)
helped: 1594
HURT: 3331
helped stats (abs) min: 1 max: 1896 x̄: 119.81 x̃: 60
helped stats (rel) min: <.01% max: 35.48% x̄: 5.06% x̃: 3.64%
HURT stats (abs)   min: 1 max: 10208 x̄: 101.63 x̃: 38
HURT stats (rel)   min: 0.01% max: 878.95% x̄: 9.01% x̃: 2.78%
95% mean confidence interval for cycles value: 21.11 38.81
95% mean confidence interval for cycles %-change: 3.76% 5.15%
Cycles are HURT.

total spills in shared programs: 12158 -> 12355 (1.62%)
spills in affected programs: 98 -> 295 (201.02%)
helped: 1
HURT: 2

total fills in shared programs: 25204 -> 25398 (0.77%)
fills in affected programs: 94 -> 288 (206.38%)
helped: 0
HURT: 3

LOST:   15
GAINED: 8

Iron Lake
total instructions in shared programs: 8121430 -> 8166733 (0.56%)
instructions in affected programs: 1148353 -> 1193656 (3.95%)
helped: 2
HURT: 4046
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.85% max: 1.92% x̄: 1.89% x̃: 1.89%
HURT stats (abs)   min: 1 max: 43 x̄: 11.20 x̃: 11
HURT stats (rel)   min: 0.20% max: 716.67% x̄: 7.40% x̃: 3.87%
95% mean confidence interval for instructions value: 11.02 11.37
95% mean confidence interval for instructions %-change: 6.84% 7.94%
Instructions are HURT.

total cycles in shared programs: 188376326 -> 188601568 (0.12%)
cycles in affected programs: 27416674 -> 27641916 (0.82%)
helped: 68
HURT: 3947
helped stats (abs) min: 2 max: 222 x̄: 13.88 x̃: 6
helped stats (rel) min: <.01% max: 1.28% x̄: 0.15% x̃: 0.01%
HURT stats (abs)   min: 2 max: 670 x̄: 57.31 x̃: 64
HURT stats (rel)   min: <.01% max: 1811.11% x̄: 4.11% x̃: 1.09%
95% mean confidence interval for cycles value: 55.01 57.20
95% mean confidence interval for cycles %-change: 2.88% 5.19%
Cycles are HURT.

LOST:   35
GAINED: 3

GM45
total instructions in shared programs: 4979794 -> 5003551 (0.48%)
instructions in affected programs: 635174 -> 658931 (3.74%)
helped: 1
HURT: 2142
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.85% max: 1.85% x̄: 1.85% x̃: 1.85%
HURT stats (abs)   min: 1 max: 43 x̄: 11.09 x̃: 11
HURT stats (rel)   min: 0.20% max: 716.67% x̄: 7.00% x̃: 3.53%
95% mean confidence interval for instructions value: 10.85 11.33
95% mean confidence interval for instructions %-change: 6.25% 7.74%
Instructions are HURT.

total cycles in shared programs: 128519586 -> 128654990 (0.11%)
cycles in affected programs: 17635304 -> 17770708 (0.77%)
helped: 46
HURT: 2088
helped stats (abs) min: 4 max: 220 x̄: 18.13 x̃: 6
helped stats (rel) min: <.01% max: 1.28% x̄: 0.15% x̃: 0.01%
HURT stats (abs)   min: 2 max: 670 x̄: 65.25 x̃: 66
HURT stats (rel)   min: <.01% max: 1464.29% x̄: 4.05% x̃: 0.99%
95% mean confidence interval for cycles value: 61.75 65.15
95% mean confidence interval for cycles %-change: 2.58% 5.34%
Cycles are HURT.

LOST:   38
GAINED: 38

Fixes: 5b908db604 ("nir/flrp: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently")
Reported-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-05-08 07:41:26 -07:00
Lionel Landwerlin
43596e5f34 anv: fix use after free
Once mem->bo is removed from the cache, it is likely to be freed.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b80930a6fe ("anv: add support for VK_EXT_memory_budget")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-08 12:02:13 +01:00
Lionel Landwerlin
a07d06f103 anv: rework queries writes to ensure ordering memory writes
We use a mix of MI & PIPE_CONTROL commands to write our queries' data
(results & availability). Those commands' memory write order is not
guaranteed with regard to their order in the command stream, unless CS
stalls are inserted between them. This is problematic for 2 reasons :

   1. We copy results from the device using MI commands even though
      the values are generated from PIPE_CONTROL, meaning we could
      copy unlanded values into the results and then copy the
      availability that is inconsistent with the values.

   2. We allow the user to poll on the availability values of the
      query pool from the CPU. If the availability lands in memory
      before the values then we could return invalid values.

This change does 2 things to address this problem :

      - We use either PIPE_CONTROL or MI commands to write both
        queries values and availability, so that the ordering of the
        memory writes guarantees that if availability is visible,
        results are also visible.

      - For the occlusion & timestamp queries we apply a CS stall
        before copying the results on the device, to ensure copying
        with MI commands see the correct values of previous
        PIPE_CONTROL writes of availability (required by the Vulkan
        spec).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-08 09:49:09 +00:00
Timothy Arceri
e19a8fe033 radv: call constant folding before opt algebraic
The pattern of calling opt algebraic first seems to have originated
in i965. The order in OpenGL drivers generally doesn't matter
because the GLSL IR optimisations do constant folding before
opt algebraic.

However in Vulkan drivers calling opt algebraic first can result
in missed constant folding opportunities.

vkpipeline-db results (VEGA64):

Totals from affected shaders:
SGPRS: 3160 -> 3176 (0.51 %)
VGPRS: 3588 -> 3580 (-0.22 %)
Spilled SGPRs: 52 -> 44 (-15.38 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 12 -> 12 (0.00 %) dwords per thread
Code Size: 261812 -> 261036 (-0.30 %) bytes
LDS: 7 -> 7 (0.00 %) blocks
Max Waves: 346 -> 348 (0.58 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-08 19:45:01 +10:00
Timothy Arceri
4fd8161773 glsl_to_nir: remove unused type_is_int()
This was missed in e00fa99b08.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-08 14:11:38 +10:00
Timothy Arceri
a01b393c39 Revert "glx: Fix synthetic error generation in __glXSendError"
This reverts commit e91ee763c3.

This seems to have broken a number of wine games. Lets revert
everything for now and try again later.

Acked-by: Adam Jackson <ajax@redhat.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110632
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110590
2019-05-08 13:16:44 +10:00
Timothy Arceri
024232b26c radeonsi: add an AMD_TEX_ANISO environment variable
This brings it inline with the recently added AMD_DEBUG.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109619
2019-05-08 09:32:25 +10:00
Kenneth Graunke
d568fcd0a0 i965: leave the top 4Gb of the high heap VMA unused
This ports commit 9e7b0988d6 from anv
to i965.  Thanks to Lionel for noticing that it was missing!

Fixes: 01058a5522 i965: Add virtual memory allocator infrastructure to brw_bufmgr.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 15:45:56 -07:00
Kenneth Graunke
17210c63a9 i965: Force VMA alignment to be a multiple of the page size.
This should happen regardless, but let's be paranoid.

Fixes: 01058a5522 i965: Add virtual memory allocator infrastructure to brw_bufmgr.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 15:45:56 -07:00
Kenneth Graunke
15f134c628 i965: Fix BRW_MEMZONE_LOW_4G heap size.
The STATE_BASE_ADDRESS "Size" fields can only hold 0xfffff in pages,
and 0xfffff * 4096 = 4294963200, which is 1 page shy of 4GB.

So we can't use the top page.

Fixes: 01058a5522 i965: Add virtual memory allocator infrastructure to brw_bufmgr.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 15:45:56 -07:00
Matt Turner
e8c74a1e16 intel/compiler: Unset flag reg when FB write is not predicated
In the FS IR we pretend that the instruction is predicated with (+f0.1)
just for flag dependency tracking purposes. Since the instruction
doesn't support predication before Haswell, we unset the predicate so we
should also unset the flag register so that we can round-trip the
disassembly.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 14:33:48 -07:00
Sagar Ghuge
5d7a9e0811 intel/disasm: Disassemble immediate value properly for dim
On haswell, for dim instruction we encode immediate float value operand
into double float,

v2: Fix comment (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-07 14:33:48 -07:00
Sagar Ghuge
6c83a68ebc intel/disasm: Disassemble JIP offset for while
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-07 14:33:48 -07:00
Sagar Ghuge
9db616e8a2 intel/compiler: Replicate 16 bit immediate value correctly
For the W or UW (signed or unsigned word) source types, the 16-bit value
must be replicated in both the low and high words of the 32-bit
immediate value.

v2: Fix replication in other places as well
V3: fix a few nits (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-07 14:33:48 -07:00
Sagar Ghuge
5211159b5b intel/compiler: Print quad value in hex format
Print quad value same as unsigned quad so that we can distinguish in
between quater control disassembled values for e.g 1/2/3[Q] and
immediate quad value for e.g 1Q. This allows round-tripping through the
assembler/disassembler.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-07 14:33:48 -07:00
Sagar Ghuge
4e828bb48a intel/tools: Add unit tests for assembler
v1: Pass executable object from meson to test(Dylan Baker)
v2: Ignore generated output files from git status(Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-05-07 14:33:48 -07:00
Mika Kuoppala
1fb5ce0a11 intel/tools: Initialize offset correctly for i965_asm
If we leave offset uninitialized, access to store
will be random depending on stack value and can
segfault.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-07 14:33:48 -07:00
Mika Kuoppala
85da1194ec intel/tools: Add meson pthread dependancy for i965_asm
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-07 14:33:48 -07:00
Sagar Ghuge
70308a5a8a intel/tools: New i965 instruction assembler tool
Tool is inspired from igt's assembler tool. Thanks to Matt Turner, who
mentored me through out this project.

v2: Fix memory leaks and naming convention (Caio)
v3: Fix meson changes (Dylan Baker)
v4: Fix usage options (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/141
2019-05-07 14:33:38 -07:00
Kenneth Graunke
a232aa5c50 iris: Also handle res->offset for buffer sampler/image views 2019-05-07 13:36:18 -07:00
Mike Blumenkrantz
ddd716e746 iris: support dmabuf imports with offsets
this adds support for imports where the image data begins at an offset
from the start of the buffer, as used in h/x264

fixes kwg/mesa#47

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-07 13:36:08 -07:00
Roland Scheidegger
748f603390 gallivm: fix broken 8-wide s3tc decoding
Brian noticed there was an uninitialized var for the 8-wide case and 128
bit blocks, which made it always crash. Likewise, the 64bit block case
had another crash bug due to type mismatch.
Color decode (used for all s3tc formats) also had a bogus shuffle for
this case, leading to decode artifacts.
Fix these all up, which makes the code actually work 8-wide. Note that
it's still not used - I've verified it works, and the generated assembly
does look quite a bit simpler actually (20-30% less instructions for the
s3tc decode part with avx2), however in practice it still seems to be
sligthly slower for some unknown reason (tested with openarena) on my
haswell box, so for now continue to split things into 4-wide vectors
before decoding.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2019-05-07 18:59:38 +02:00
Vasily Khoruzhick
6b46399e2f lima: enable sin and cos lowering for GP
GP doesn't support sin/cos natively, so we have to lower them.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 15:25:21 +00:00
Vasily Khoruzhick
e67e4e90b2 nir: implement lowering for fsin and fcos
Lower sin and cos using Nick's fast sin/cos approximation from
https://web.archive.org/web/20180105155939/http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648

It's suitable for GLES2, but it throws warnings in dEQP GLES3 precision tests.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 15:25:21 +00:00
Rob Clark
b15c46e6bf freedreno/ir3: move const_state to ir3_shader
For a6xx, we construct/emit a single VS const state used for both
binning pass and draw pass.  So far we were mostly getting lucky that
there were not (obvious) mismatches between the const_state (like
different lowered immediates) between the binning and draw pass
VS ir3_shader_variant.

And I guess this situation will come up more as GS and tess is added
into the equation.

Since really everything about the const state is not specific to the
variant, move this.  The main exception is lowered immediates, but these
are the last to appear in the layout, and it doesn't hurt for each new
shader variant to just append any immed's it lowers to the end of the
immediate state.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark
5690f83bb5 freedreno/ir3: split out const_state setup
Next patch moves const_state to ir3_shader, before the compile context
is created.  So move the code around in prep to call it earlier.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark
9403184ddd freedreno/ir3: move immediates to const_state
They are really part of the constant state, and it will moving things
from ir3_shader_variant to ir3_shader if we combine them.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark
23e7a34466 freedreno/ir3: consolidate const state
Combine the offsets of differenet parts of the constant space with (what
was formerly known as) ir3_driver_const_layout.  Bunch of churn, but no
functional change.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark
ef3eecd66b freedreno/ir3: move ir3_pointer_size()
Move to ir3_compiler so it doesn't depend on the compile context.  Prep
work for moving constant state from variant (where we have compile
context) to shader (where we do not).

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Lionel Landwerlin
2d2927938f vulkan/overlay-layer: fix cast errors
Not quite sure what version of GCC/Clang produces errors (8.3.0
locally was fine).

v2: also fix an integer literal issue (Karol)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-07 10:45:45 +01:00
Samuel Iglesias Gonsálvez
bc66cebc0d anv: fix alphaToCoverage when there is no color attachment
There are tests in CTS for alpha to coverage without a color attachment
that are failing. This happens because we remove the shader color
outputs when we don't have a valid color attachment for them, but when
alpha to coverage is enabled we still want to preserve the the output
at location 0 since we need the alpha component. In that case we will
also need to create a null render target for RT 0.

v2:
  - We already create a null rt when we don't have any, so reuse that
    for this case (Jason)
  - Simplify the code a bit (Iago)

v3:
  - Take alpha to coverage from the key and don't tie this to depth-only
    rendering only, we want the same behavior if we have multiple render
    targets but the one at location 0 is not used. (Jason).
  - Rewrite commit message (Iago)

v4:
  - Make sure we take into account the array length of the shader outputs,
    which we were no handling correctly either and make sure we also
    create null render targets for any invalid array entries too.

v5:
  - Simplify removal of unused outputs by using rt_used[] so we don't have
    to special case alpha to coverage there too.

Fixes the following CTS tests:
dEQP-VK.pipeline.multisample.alpha_to_coverage_no_color_attachment.*

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 09:35:47 +02:00
Ian Romanick
c866500525 intel/compiler: Don't always require precise lowering of flrp
No changes on any other Intel platforms.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8164367 -> 8135551 (-0.35%)
instructions in affected programs: 3271235 -> 3242419 (-0.88%)
helped: 13636
HURT: 90
helped stats (abs) min: 1 max: 30 x̄: 2.13 x̃: 1
helped stats (rel) min: 0.04% max: 10.77% x̄: 1.16% x̃: 0.97%
HURT stats (abs)   min: 1 max: 4 x̄: 1.80 x̃: 2
HURT stats (rel)   min: 0.26% max: 11.11% x̄: 1.76% x̃: 0.78%
95% mean confidence interval for instructions value: -2.13 -2.07
95% mean confidence interval for instructions %-change: -1.16% -1.13%
Instructions are helped.

total cycles in shared programs: 188719974 -> 188586222 (-0.07%)
cycles in affected programs: 70415766 -> 70282014 (-0.19%)
helped: 12563
HURT: 515
helped stats (abs) min: 2 max: 600 x̄: 10.90 x̃: 6
helped stats (rel) min: <.01% max: 5.48% x̄: 0.48% x̃: 0.27%
HURT stats (abs)   min: 2 max: 54 x̄: 6.07 x̃: 4
HURT stats (rel)   min: 0.01% max: 4.48% x̄: 0.24% x̃: 0.08%
95% mean confidence interval for cycles value: -10.56 -9.90
95% mean confidence interval for cycles %-change: -0.47% -0.45%
Cycles are helped.

LOST:   0
GAINED: 13

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
ab86926156 nir/algebraic: Reassociate open-coded flrp(1, b, c)
In a previous verion of this patch, Jason commented,

   "Re-associating based on whether or not something has a constant
   value of 1.0 seems a bit sneaky.  I think it's well within the rules
   but it seems like something that could bite you."

That is possibly true.  The reassociation will generate different
results if fabs(b) >= 2**24 and fabs(c) < 0.5.  The delta increases as
fabs(c) approaches 0.

However, i965 has done this same reassociation indirectly for years.
We would previously allow nir_op_flrp on all pre-Gen11 hardware even
though Gen4 and Gen5 do not have a LRP instruction.  Optimizations in
nir_opt_algebraic would convert expressions like a+c(b-a) into flrp(a,
b, c).  On Gen7+, the hardware performs the same arithmetic as
a(1-c)+bc.  Gen6 seems to implement LRP as a+c(b-a).  On Gen4 and
Gen5, we would lower LRP to a sequence of instructions that implement
a(1-c)+bc.  The lowering happens after all constant folding, so we
would litterally generate a 1+(-1) instruction sequence in this
scenario: one instruction to load either 1 or -1 in a register, and
another instruction to add either -1 or 1 to it.

This patch just cuts out the middle man.  Do the reassociation that
we've always done, but do it explicitly at a time when we can benefit
from other optimizations.

A few cases that were hurt by "nir: Lower flrp(±1, b, c) and flrp(a,
±1, c) differently" are restored by this patch.  This includes a few
shaders in ET:QW.

I tried a similar thing for open-coded flrp(-1, b, c), and it hurt
instructions on 35 shaders for ILK without helping any.  The helped /
hurt cycles was about even.

No changes on any other Intel platforms.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8172020 -> 8164367 (-0.09%)
instructions in affected programs: 1089851 -> 1082198 (-0.70%)
helped: 3285
HURT: 64
helped stats (abs) min: 1 max: 6 x̄: 2.35 x̃: 2
helped stats (rel) min: 0.13% max: 12.00% x̄: 1.15% x̃: 0.83%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.24% max: 0.64% x̄: 0.39% x̃: 0.38%
95% mean confidence interval for instructions value: -2.32 -2.25
95% mean confidence interval for instructions %-change: -1.16% -1.09%
Instructions are helped.

total cycles in shared programs: 188758338 -> 188719974 (-0.02%)
cycles in affected programs: 20004922 -> 19966558 (-0.19%)
helped: 3012
HURT: 477
helped stats (abs) min: 2 max: 142 x̄: 13.41 x̃: 12
helped stats (rel) min: 0.01% max: 6.37% x̄: 0.52% x̃: 0.24%
HURT stats (abs)   min: 2 max: 328 x̄: 4.27 x̃: 4
HURT stats (rel)   min: <.01% max: 1.55% x̄: 0.14% x̃: 0.11%
95% mean confidence interval for cycles value: -11.38 -10.62
95% mean confidence interval for cycles %-change: -0.46% -0.41%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
c995d1ca3a nir/flrp: Lower flrp(a, b, #c) differently
This doesn't help on Intel GPUs now because we always take the
"always_precise" path first.  It may help on other GPUs, and it does
prevent a bunch of regressions in "intel/compiler: Don't always require
precise lowering of flrp".

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
ae02622d8f nir/flrp: Lower flrp(a, b, c) differently if another flrp(_, b, c) exists
There is little effect on Intel GPUs now because we almost always take
the "always_precise" path first.  It may help on other GPUs, and it does
prevent a bunch of regressions in "intel/compiler: Don't always require
precise lowering of flrp".

No changes on any other Intel platforms.

GM45 and Iron Lake had similar results. (Iron Lake shown)
total cycles in shared programs: 188852500 -> 188852484 (<.01%)
cycles in affected programs: 14612 -> 14596 (-0.11%)
helped: 4
HURT: 0
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 0.09% max: 0.13% x̄: 0.11% x̃: 0.11%
95% mean confidence interval for cycles value: -4.00 -4.00
95% mean confidence interval for cycles %-change: -0.13% -0.09%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
6698d861a5 nir/flrp: Lower flrp(a, b, c) differently if another flrp(a, _, c) exists
This doesn't help on Intel GPUs now because we always take the
"always_precise" path first.  It may help on other GPUs, and it does
prevent a bunch of regressions in "intel/compiler: Don't always require
precise lowering of flrp".

No changes on any Intel platform.  Before a number of large rebases this
helped cycles in a couple shaders on Iron Lake and GM45.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00