Ivybridge and Baytrail can't use mach with 2Q quarter control, so just
do it without the accumulator. Stupid accumulator.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The next commit uses an add(16) with a UW destination with a stride of
2, which needs compression control since it's writing two registers. The
old code would have failed to set compression control correctly.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Gen8+'s MUL instruction doesn't ignore the high 16-bits of one source
like on earlier platforms, so we can constant propagate into it without
worry. Integer multiplies (not into the accumulator, which is done for
imul_high) are lowered in lower_integer_multiplication(), so it's safe
there as well.
On Broadwell, fragment shaders only:
total instructions in shared programs: 4377769 -> 4377451 (-0.01%)
instructions in affected programs: 48064 -> 47746 (-0.66%)
helped: 156
On Broadwell, vertex shaders only:
total instructions in shared programs: 2858885 -> 2856313 (-0.09%)
instructions in affected programs: 26380 -> 23808 (-9.75%)
helped: 134
On Broadwell, vertex shaders only (with INTEL_USE_NIR=1):
total instructions in shared programs: 2911688 -> 2865984 (-1.57%)
instructions in affected programs: 1421715 -> 1376011 (-3.21%)
helped: 6186
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
32-bit x 32-bit integer multiplication requires multiple instructions
until Broadwell. This patch just lets us treat the MUL instruction in
the FS backend like it operates on Broadwell, and after optimizations
we lower it into a sequence of instructions on older platforms.
Doing this will allow us to some extra optimization on integer
multiplies.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Currently, when the MinFilter is GL_LINEAR or GL_NEAREST we hide the
actual miplevel count from the hardware (and we avoid re-creating
the miptree structure with all the levels), since we don't expect
levels other than the base level to be needed. Unfortunately,
GLSL's textureSize() function is an exception to this rule. This
function takes a lod parameter that we need to use to return the
size of the appropriate miplevel (if it exists). The spec only
requires that the miplevel exists, so even if the sampler is
configured with a linear or nearest MinFilter, as far as the user
has uploaded miplevels for the texture, textureSize() should return
the appropriate sizes.
This patch fixes this by exposing the actual miplevel count for all
sampling engine textures while keeping the original implementation
for render targets (for render targets textures we do not provide
the miplevel count but the actual LOD we are wrting to, so we
want to make sure that we make this the base level).
Fixes 28 dEQP tests in the following category:
dEQP-GLES3.functional.shaders.texture_functions.texturesize.*
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
I've been promissed in a bug that this will be fixed in a future version of
the header. However, in the interest of my branch building, I'm adding
these changes in myself for the moment.
We don't actually use it to create all the instructions but we do use it
for insertion always. This should make things far more consistent for
implementing extended instructions.
We do control-flow handling as a two-step process. The first step is to
walk the instructions list and record various information about blocks and
functions. This is where the acutal nir_function_overload objects get
created. We also record the start/stop instruction for each block. Then
a second pass walks over each of the functions and over the blocks in each
function in a way that's NIR-friendly and actually parses the instructions.