80923e8d58
In looking at the profile of dEQP, GLES3 was spending 5-10% of its time in ReadPixels, and almost all of that is b8g8r8a8_unorm8. It's really slow because we're getting about 47MB/s by doing uncached reads 32 bits at a time in the code-generated unpack. If we use NEON to generate larger bus transactions, we can speed things up to 136MB/s. In comparison, raw ldr/str read/writes with no byte swapping can hit a max of 216MB/sec. Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10014>