intel/ds: Skip expensive timestamp query until necessary

The Xe ioctl DRM_XE_DEVICE_QUERY_ENGINE_CYCLES provides accurate
timestamps correlated between the CPU and GPU. However, it is slow and
impacts performance while collecting Perfetto traces.

Instead, use Perfetto's GetBootTimeNs() to track when to emit the
BUILTIN_CLOCK_BOOTTIME clock sync event so it only occurs every 1
second. This reduces the impact of recording gpu.renderstages from
-8% to -4%.

More concretely, FPS measurements when tracing Unity BoatAttack demo on
an Intel ADL device:

* gpu.renderstages disabled:            48.044293667
* gpu.renderstages enabled:             38.119778333 (-20.66%)
* gpu.renderstages enabeled + this fix: 42.641818333 (-11.24%)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37095>
This commit is contained in:
Tim Van Patten
2025-08-29 14:25:44 -06:00
committed by Marge Bot
parent 352ca665cb
commit c585341552
+5 -4
View File
@@ -127,6 +127,10 @@ sync_timestamp(IntelRenderpassDataSource::TraceContext &ctx,
struct intel_ds_device *device)
{
uint64_t cpu_ts, gpu_ts;
uint64_t boottime = perfetto::base::GetBootTimeNs().count();
if (boottime < device->next_clock_sync_ns)
return;
if (!intel_gem_read_correlate_cpu_gpu_timestamp(device->fd,
device->info.kmd_type,
@@ -141,13 +145,10 @@ sync_timestamp(IntelRenderpassDataSource::TraceContext &ctx,
uint32_t cpu_clock_id = perfetto::protos::pbzero::BUILTIN_CLOCK_BOOTTIME;
gpu_ts = intel_device_info_timebase_scale(&device->info, gpu_ts);
if (cpu_ts < device->next_clock_sync_ns)
return;
PERFETTO_LOG("sending clocks gpu=0x%08x", device->gpu_clock_id);
device->sync_gpu_ts = gpu_ts;
device->next_clock_sync_ns = cpu_ts + 1000000000ull;
device->next_clock_sync_ns = boottime + 1000000000ull;
MesaRenderpassDataSource<IntelRenderpassDataSource, IntelRenderpassTraits>::EmitClockSync(ctx,
cpu_ts, gpu_ts, cpu_clock_id, device->gpu_clock_id);