nvprof --query-events

Posted 2021-12-22 ShaderJoy

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了nvprof --query-events相关的知识，希望对你有一定的参考价值。

Available Events:

Name Description

Device 0 (GeForce GTX 970M):

Domain domain_a:

elapsed_cycles_sm: Elapsed clocks

Domain domain_b:

fb_subp0_read_sectors: Number of DRAM read requests to sub partition 0, increments by 1 for 32 byte access.
fb_subp1_read_sectors: Number of DRAM read requests to sub partition 1, increments by 1 for 32 byte access.
fb_subp0_write_sectors: Number of DRAM write requests to sub partition 0, increments by 1 for 32 byte access.
fb_subp1_write_sectors: Number of DRAM write requests to sub partition 1, increments by 1 for 32 byte access.

Domain domain_c:

gld_inst_8bit: Total number of 8-bit global load instructions that are executed by all the threads across all thread blocks.
gld_inst_16bit: Total number of 16-bit global load instructions that are executed by all the threads across all thread blocks.
gld_inst_32bit: Total number of 32-bit global load instructions that are executed by all the threads across all thread blocks.
gld_inst_64bit: Total number of 64-bit global load instructions that are executed by all the threads across all thread blocks.
gld_inst_128bit: Total number of 128-bit global load instructions that are executed by all the threads across all thread blocks.
gst_inst_8bit: Total number of 8-bit global store instructions that are executed by all the threads across all thread blocks.
gst_inst_16bit: Total number of 16-bit global store instructions that are executed by all the threads across all thread blocks.
gst_inst_32bit: Total number of 32-bit global store instructions that are executed by all the threads across all thread blocks.
gst_inst_64bit: Total number of 64-bit global store instructions that are executed by all the threads across all thread blocks.
gst_inst_128bit: Total number of 128-bit global store instructions that are executed by all the threads across all thread blocks.

Domain domain_d:

warps_launched: Number of warps launched.
inst_issued0: Number of cycles that did not issue any instruction, increments per warp.
inst_issued1: Number of cycles that issued single instruction, increments per warp.
inst_issued2: Number of cycles that issued dual instructions, increments per warp.
inst_executed: Number of instructions executed per warp.
local_store: Number of executed store instructions where state space is specified as local, increments per warp on a multiprocessor.
local_load: Number of executed load instructions where state space is specified as local, increments per warp on a multiprocessor.
shared_load: Number of executed load instructions where state space is specified as shared, increments per warp on a multiprocessor.
shared_store: Number of executed store instructions where state space is specified as shared, increments per warp on a multiprocessor.
shared_atom_cas: Number of ATOMS.CAS instructions executed per warp.
shared_atom: Number of ATOMS instructions executed per warp.
global_atom_cas: Number of ATOM.CAS instructions executed per warp.
atom_count: Number of ATOM instructions executed per warp.
global_load: Number of executed load instructions where state space is specified as global, increments per warp on a multiprocessor.
global_store: Number of executed store instructions where state space is specified as global, increments per warp on a multiprocessor.
gred_count: Number of reduction operations performed per warp.
branch: Number of branch instructions executed per warp on a multiprocessor.
active_cycles: Number of cycles a multiprocessor has at least one active warp.
sm_cta_launched: Number of blocks launched
shared_ld_bank_conflict: Number of shared load bank conflict generated when the addresses for two or more shared memory load requests fall in the same memory bank.
shared_st_bank_conflict: Number of shared store bank conflict generated when the addresses for two or more shared memory store requests fall in the same memory bank.

Domain domain_e:

以上是关于nvprof --query-events的主要内容，如果未能解决你的问题，请参考以下文章