CUPTI - The CUDA Profiling Tools Interface

Posted WhateverYoung

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了CUPTI - The CUDA Profiling Tools Interface相关的知识,希望对你有一定的参考价值。

CUPTI - The CUDA Profiling Tools Interface

Usage

The CUDA Profiling Tools Interface (CUPTI) enables the creation of profiling and tracing tools that target CUDA applications. CUPTI provides four APIs: the Activity API, the Callback API, the Event API, and the Metric API. Using these APIs, you can develop profiling tools that give insight into the CPU and GPU behavior of CUDA applications. CUPTI is delivered as a dynamic library on all platforms supported by CUDA.
提供四种API给用户来自定义profile工具,以头文件和动态库形式提供。Driver高版本兼容低版本CUPTI,CUPTI高版本无法在低版本Driver上运行。库的初始化时在第一次调用库中任意函数,无需特别关注。

Activity API

The CUPTI Activity API allows you to asynchronously collect a trace of an application’s CPU and GPU CUDA activity.
Activity Record:每一种行为都有一种类型,比如CUpti_ActivityMemcpy,通用类型为 CUpti_Activity
Activity Buffer:An activity buffer is used to transfer one or more activity records from CUPTI to the client.
API:cuptiActivityRegisterCallbacks and cuptiActivityFlushAll.

必须注册两个回调函数,一个用于申请buffer,另一个用于释放buffer。 One of these callbacks will be invoked whenever CUPTI needs an empty activity buffer. The other callback is used to deliver a buffer containing one or more activity records to the client.
cuptiActivityFlushAll 强制回刷buffer.
cuptiActivityGetAttribute and cuptiActivitySetAttribute 设置回刷属性。
general sample:activity_trace_async
下面是几个具体的例子,应用Activity API:

  • SASS Source Correlation:可以得到汇编指令和cuda c指令的对应关系,运行时的一些信息,帮助在汇编层次上找到程序性能的问题,指导优化cuda c程序。不过没特别理解这些信息。原理上第一步,把the PC to SASS instruction联系在一起,第二步the SASS instruction to CUDA source line关联在一起。通过例子中可以看到,如何对应cuda c程序到pc计数器,也就是cubin dump 出的sass的汇编指令,不过算是一种粗略的对应关系,对于熟悉汇编的人来说,这个功能就有些鸡肋了。sass_source_map
  • PC Sampling 。用于The PC Sampling gives the number of samples for each source and assembly line with various stall reasons。给出每条汇编指令的stall reasons. pc_sampling
  • NVLink,测试cpu-gpu,gpu-gpu之间通过nvlink互联的带宽。nvlink_bandwidth
  • OpenACC,TODO
  • External Correlation,cuda8.0之后支持了更多的api,包括openmp,openacc,mpi等等。

Callback API

The CUPTI Callback API allows you to register a callback into your own code. Your callback will be invoked when the application being profiled calls a CUDA runtime or driver function, or when certain events occur in the CUDA driver.
用于注册回调函数,可以注册给run-time-api, driver-api, resource tracking, synchronization notification,这样就可以在每个api函数调用的时候做一些用户自定义的事,比如记录时间戳等等。

  • Driver and Runtime API Callbacks callback_event and callback_timestamp
  • Resource Callbacks 在调用资源创建回收函数的时候可以用的回调,For example, when a CUDA context is created, your callback function will be invoked with a callback ID equal to CUPTI_CBID_RESOURCE_CONTEXT_CREATED
  • Synchronization Callbacks:associate a callback function with CUDA context and stream synchronizations.
  • NVIDIA Tools Extension Callbacks

Event API

The CUPTI Event API allows you to query, configure, start, stop, and read the event counters on a CUDA-enabled device.

  • cupti_query sample中可以查询所有支持的domain,events,metrics等等
  • callback_event callback_metric Collecting Kernel Execution Events,kernel执行过程中,收集性能信息,结合回调api实现在kernel运行期间,读取event和metrics。
  • Sampling Events,one thread work,another thread reads events.
  • event_multi_gpu,多gpu实例的运行

Metric API

  • callback_metric Collecting Kernel Execution Events,kernel执行过程中,收集性能信息,结合回调api实现在kernel运行期间,读取event和metrics。

samples

  • activity_trace_async
    This sample shows how to collect a trace of CPU and GPU activity using the new asynchronous activity buffer APIs.
  • callback_event
    This sample shows how to use both the callback and event APIs to record the events that occur during the execution of a simple kernel. The sample shows the required ordering for synchronization, and for event group enabling, disabling and reading.
  • callback_metric
    This sample shows how to use both the callback and metric APIs to record the metric’s events during the execution of a simple kernel, and then use those events to calculate the metric value.
  • callback_timestamp
    This sample shows how to use the callback API to record a trace of API start and stop times.
  • cupti_query
    This sample shows how to query CUDA-enabled devices for their event domains, events, and metrics.
  • event_sampling
    This sample shows how to use the event APIs to sample events using a separate host thread.
  • event_multi_gpu
    This sample shows how to use the CUPTI event and CUDA APIs to sample events on a setup with multiple GPUs. The sample shows the required ordering for synchronization, and for event group enabling, disabling and reading.
  • sass_source_map
    This sample shows how to generate CUpti_ActivityInstructionExecution records and how to map SASS assembly instructions to CUDA C source.
  • unified_memory
    This sample shows how to collect information about page transfers for unified memory.
  • pc_sampling
    This sample shows how to collect PC Sampling profiling information for a kernel.
  • nvlink_bandwidth
    This sample shows how to collect NVLink topology and NVLink throughput metrics in continuous mode.
  • openacc_trace
    This sample shows how to use CUPTI APIs for OpenACC data collection.

Modules

介绍所有的具体API文档

Data Structures

介绍数据结构,头文件文档

Reference

https://docs.nvidia.com/cuda/cupti/r_main.html#r_main
https://github.com/srvm/cupti_profiler/blob/master/examples/demo.cu

以上是关于CUPTI - The CUDA Profiling Tools Interface的主要内容,如果未能解决你的问题,请参考以下文章

TensorFlow couldn t open CUDA library cupti64 80 dll Intern

anaconda env:cupti64_100.dll找不到

The minimum required Cuda capability is 3.7.

cuda环境下安装opencv出现nvcc warning : The 'compute_11'

CUDA Intro to Parallel Programming笔记--Lesson 1 The GPU Programming Model

cuda The driver installation is unable to locate the kernel source 深恶痛觉的CSDN用户以及深恶痛觉的lyf5231