...with a substitution. This function is largely a copy-and-paste of
try_fold_alu (nir_opt_constant_folding.c), and an argument could be made
that this function belongs in that file.
v2: Some changes were mistakenly squashed in to "nir/loop_analyze: Use
try_eval_const_alu and induction variable basis info" that should have
been here.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>
Loop analysis doesn't currently treat values updated by shifts as
induction variables. Future patches will change this.
v2: Don't use the contradiction ilt(x, INT_MIN).
v3: Delete some errant code in UNKNOWN_COUNT_TEST. Noticed by Tim.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>
initial bitstream size was set to width * height * 2 which is
larger than yuv size. set initial bitstream size to encoded
bitstream size approximately to optimize memory consumption.
This is just an initial size setting, it will get resized later
if it's not big enough. As a result of this change, we don't need to
allocate super big size at the every beginning. Only allocate
big size when needed in order to save some memory
Signed-off-by: Sajeesh Sidharthan <sajeesh.sidharthan@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Acked-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21918>
If the app passes us unaligned buffer offsets, we need to align them
down to the nearest aligned offset, and then put the difference into
the descriptor set buffer.
Fixes: 8bd5fbf8 ("dzn: Bind buffers for bindless descriptor sets")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22225>
Cache coherent UMA implies that the GPU is reading data through the
CPU caches. Using write-combined CPU pages for such a system would
be bad, since the GPU would then be reading uncached data. One
example of such a system is WARP. This significantly improves WARP's
performance for some apps (including the CTS).
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22225>
Unlike DXBC, DXIL's shift instructions don't have the implicit behavior
that they only take the 5 bits. This is observable if you try to have
DXC do a shift of a dynamic value, e.g. a constant buffer value, where
the compiler inserts the appropriate 'and' op. We need to do the same.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22225>
Shifts should always use 32bit shift values, and when lowering to
masked, we need to use 32-bit atomics. That means that we should also
treat 24bit stores as a single masked op rather than one 16bit unmasked
and one 8bit masked.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22225>
Unlike the legacy CreateContext path, we would try to send the
GLXCreateContextAttribs request regardless of whether we'd successfully
created the client context state. And there's not a lot on the server
side to go wrong besides BadAlloc, so if the request succeeded but
the client side didn't we'd need to destroy the server context and
synthesize an X error. Since that itself involves more X protocol it's
tricky to get the request number right in the error, and tests and apps
can notice when you get it wrong.
Since we have now fixed client-side validation to generate the right
errors at the right times, this patch does something simpler, we match
CreateContext and fail early if the client-side setup fails. Now there's
no question of what request number to use, because we haven't sent any
protocol, the error is for the request as if it'd been sent.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4763
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12006>
There's two kinds of "bad version" you might encounter here, either the
combination does not name a defined version (like 1.7) or it names
something the driver can't do (like asking r300 to do 4.0). EGL does not
distinguish these cases, but GLX calls them BadMatch and GLXBadFBConfig
respectively.
Since api_mask is the set of driver supported APIs, and we can only
support defined APIs, don't check it early in driCreateContextAttribs,
just let it fall out from validate_context_version.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12006>
This has no functional change because everyone calling this is
discarding the error code, because we're relying on the server to
generate the right thing for us. But we create the direct context first
and the server isn't going to enforce everything we want it to
(supported GL versions for example). Convert out from DRI error codes to
X/GLX error codes so we can fail the right way on the client side. We're
still throwing the error away in all of the callers but that'll change
shortly.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12006>
This pass rarely makes any changes, so work a little harder to preserve
more meta data.
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a debugoptimized build, improves
performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.*spir*'" by
-0.2% ± 0.1% (n = 5, pooled s = 0.431885).
v2: Add some parenthesis. Suggested by Lionel.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
Two linked list management changes:
- Use the list head sentinel as the initial cursor. It is, after all, a
proper node in the list.
- Iterate the list of blocks starting with the second block instead of
skipping the first block in the loop.
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a release build, improves
performance of compiling shaders from batman_arkham_city_goty.foz by
-0.24% ± 0.09% (n = 5, pooled s = 0.324106).
v2: Use nir_cursor instead of direct list manipultion. Suggested by
Lionel.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a release build, improves
performance of compiling shaders from batman_arkham_city_goty.foz by
-1.09% ± 0.084% (n = 5, pooled s = 0.354471)
Reduces the size of a release build by 26k.
text data bss dec hex filename
23163641 400720 231360 23795721 16b1809 before/lib64/dri/iris_dri.so
23137264 400720 231360 23769344 16ab100 after/lib64/dri/iris_dri.so
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
Since one of the register must always be either VGRF or FIXED_GRF, much
of regions_overlap and reg_offset can be elided.
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a debugoptimized build, improves
performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.*spir*'" by
-0.29% ± 0.097% (n = 5, pooled s = 0.361697).
Using a release build, improves performance of compiling shaders from
batman_arkham_city_goty.foz by -3.3% ± 0.04% (n = 5, pooled s =
0.178312).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
This function only exists in builds with assertions, so it only matters
there.
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a debugoptimized build, improves
performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.*spir*'" by
-5.2% ± 0.16% (n = 5, pooled s = 0.657887).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>