如何结合使用thrust和valgrind来检测内存泄漏?

Posted

技术标签:

【中文标题】如何结合使用thrust和valgrind来检测内存泄漏?【英文标题】:How to use thrust and valgrind together to detect memory leaks? 【发布时间】:2021-05-01 20:06:30 【问题描述】:

有没有办法将 CUDA 推力库与 Valgrind 内存泄漏检查器一起使用?

我之所以问是因为这个简单的程序:

#include <thrust/device_vector.h>

int main()
    thrust::device_vector<int> D(5);
    assert( D.size() == 5 );

编译:

$ /usr/local/cuda-11.1/bin/nvcc device_vector.cu -o device_vector.cu.x

使 Valgrind 相信存在多个可能的内存泄漏。

我知道它们一定是误报,并且 valgrind 不是用来检测 GPU 内存泄漏的,但我想知道是否有标志或标准方法可以使这两种工具一起工作(例如检测 CPU 内存泄漏) .

如果周围有一组标准的 Valgrind 例外,我会很乐意使用它们,但我想在玩 wack-a-mole 之前先问一下。

$ valgrind ./device_vector.cu.x 
==765561== Memcheck, a memory error detector
==765561== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==765561== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==765561== Command: ./device_vector.cu.x
==765561== 
==765561== Warning: noted but unhandled ioctl 0x30000001 with no size/direction hints.
==765561==    This could cause spurious value errors to appear.
==765561==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==765561== Warning: noted but unhandled ioctl 0x27 with no size/direction hints.
==765561==    This could cause spurious value errors to appear.
==765561==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==765561== Warning: noted but unhandled ioctl 0x25 with no size/direction hints.
==765561==    This could cause spurious value errors to appear.
==765561==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==765561== Warning: noted but unhandled ioctl 0x37 with no size/direction hints.
==765561==    This could cause spurious value errors to appear.
==765561==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==765561== Warning: noted but unhandled ioctl 0x17 with no size/direction hints.
==765561==    This could cause spurious value errors to appear.
==765561==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==765561== Warning: set address range perms: large range [0x200000000, 0x300200000) (noaccess)
==765561== Warning: set address range perms: large range [0x681f000, 0x2681e000) (noaccess)
==765561== Warning: noted but unhandled ioctl 0x19 with no size/direction hints.
==765561==    This could cause spurious value errors to appear.
==765561==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==765561== Warning: set address range perms: large range [0x10006000000, 0x10106000000) (noaccess)
==765561== Warning: noted but unhandled ioctl 0x49 with no size/direction hints.
==765561==    This could cause spurious value errors to appear.
==765561==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==765561== Warning: noted but unhandled ioctl 0x21 with no size/direction hints.
==765561==    This could cause spurious value errors to appear.
==765561==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==765561== Warning: noted but unhandled ioctl 0x1b with no size/direction hints.
==765561==    This could cause spurious value errors to appear.
==765561==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==765561== Warning: noted but unhandled ioctl 0x44 with no size/direction hints.
==765561==    This could cause spurious value errors to appear.
==765561==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==765561== 
==765561== HEAP SUMMARY:
==765561==     in use at exit: 6,678,624 bytes in 8,647 blocks
==765561==   total heap usage: 11,448 allocs, 2,801 frees, 40,718,174 bytes allocated
==765561== 
==765561== LEAK SUMMARY:
==765561==    definitely lost: 0 bytes in 0 blocks
==765561==    indirectly lost: 0 bytes in 0 blocks
==765561==      possibly lost: 22,216 bytes in 187 blocks
==765561==    still reachable: 6,656,408 bytes in 8,460 blocks
==765561==         suppressed: 0 bytes in 0 blocks
==765561== Rerun with --leak-check=full to see details of leaked memory
==765561== 
==765561== For lists of detected and suppressed errors, rerun with: -s
==765561== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

提到的自述文件README_MISSING_SYSCALL_OR_IOCTL 对我帮助不大。


补充说明:CUDA 带有一个名为 cuda-memcheck 的 memchecker,它不会在上面的程序中报告内存泄漏,但它似乎不能替代 valgrind,因为它没有'不在一个简单的 cpu 程序中检测实际的内存泄漏:

#include <thrust/device_vector.h>

int main()
//  thrust::device_vector<int> D(5);
//  assert( D.size() == 5 );
    
//  cudaDeviceSynchronize();
    std::allocator<int> alloc;
    int* p = alloc.allocate(10);
    p[0] = 2;
    return p[0];

【问题讨论】:

cuda-memcheck 泄漏检查工具适用于设备代码,而不是主机代码。因此,它不能识别主机代码泄漏也就不足为奇了。它当然不能替代 valgrind。 valgrind 适用于主机代码。 cuda-memcheck 适用于设备代码。我并不是说这是唯一的区别,或者它们在其他方面功能相同。 你能找出这些 ioctl 是什么以及它们是否需要任何特殊处理吗? 如果你想为这些 ioctl 添加 Valgrind 支持,你需要知道 ioctl 做了什么。如果您不知道他们在做什么,那么您能做的最好的事情就是将他们视为无操作,并希望这就足够了。这些不是误报。 Valgrind 并不是说​​有错误,只是它无法识别这些 ioctl。 这与nvcc无关,与推力无关。它与 CUDA 运行时 API(和/或 CUDA 驱动程序 API)库有关,这些库直接与主机操作系统以及 GPU 驱动程序交互。在这种情况下,ioctl 和系统调用是预期的并且是“正常的”。我并不是说这些调用或库是原始的。错误总是可能的。如上所述,valgrind 认为诸如“未处理”或“无法识别”(由 valgrind 提供,除非提供了合适的包装器)。所以输出至少在某种程度上是“正常的”。同样,没有关于缺陷的索赔。 我不知道任何规范的东西。正如我所提到的,我认为其中大部分来自对 CUDA 运行时 API 库的使用。您可以在 developer.nvidia.com 上 file a bug 请求采取某种行动来解决这些问题,但不知道具体是什么。对于短期方法,可能是this。 【参考方案1】:

目前我在我的项目的根目录中使用这个抑制文件.valgrind-supressions


   <suppression_for_thrust_allocations>
   Memcheck:Leak
   match-leak-kinds: possible
   fun:*alloc
   ...
   obj:*libcuda.so.*
   ...
   obj:*libcuda.so.*
   fun:__cudart*
   ...
   fun:__cudart*
   fun:cudaMalloc
   fun:_ZN6thrust6system4cuda6detail20cuda_memory_resourceIXadL_Z10cudaMallocEEXadL_Z8cudaFreeEENS_8cuda_cub7pointerIvEEE11do_allocateEmm
   ...

(三个点是实际代码)

通过删除_ZN6thrust 行可能会更通用,但我不想过早地概括抑制。

请务必注意,这不是检查 GPU 中的泄漏,因为需要 cuda-memcheck


更新:我将抑制范围扩大到 1) 包括从 cudaMallocManaged 生成的案例 2) 由 CUDA 运行时引起,没有推力分配器的参与(如 @RobertCrovella 所述)。


   <suppression_for_cudaMalloc_and_cudaMallocManaged_allocations>
   Memcheck:Leak
   match-leak-kinds: possible
   fun:*alloc
   ...
   obj:*libcuda.so.*
   ...
   obj:*libcuda.so.*
   fun:__cudart*
   ...
   fun:__cudart*
   fun:cudaMalloc*
   ...


CMakeLists.txt我使用这些选项来实际使用上面列出的抑制文件

  ...
set(MEMORYCHECK_COMMAND_OPTIONS "-q --tool=memcheck --leak-check=yes --num-callers=52 --trace-children=yes --leak-check=full --track-origins=yes --gen-suppressions=all") # must go before `include(CTest)`
set(MEMORYCHECK_SUPPRESSIONS_FILE "$PROJECT_SOURCE_DIR/.valgrind-suppressions") # must go before `include(CTest)`

include(CTest)
  ...

(这里的三个点代表文件的其余部分)

欢迎改进这种抑制模式。


作为参考,valgrind 自动生成的抑制如下所示:


   <insert_a_suppression_name_here>
   Memcheck:Leak
   match-leak-kinds: possible
   fun:calloc
   obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
   obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
   obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
   obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
   obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
   obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
   obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
   obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
   obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
   obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
   obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
   obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
   obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
   obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
   fun:__cudart764
   fun:__cudart763
   fun:__cudart768
   fun:__cudart941
   fun:__cudart607
   fun:cudaMalloc
   fun:_ZN6thrust6system4cuda6detail20cuda_memory_resourceIXadL_Z10cudaMallocEEXadL_Z8cudaFreeEENS_8cuda_cub7pointerIvEEE11do_allocateEmm
   fun:_ZN6thrust26device_ptr_memory_resourceINS_6system4cuda6detail20cuda_memory_resourceIXadL_Z10cudaMallocEEXadL_Z8cudaFreeEENS_8cuda_cub7pointerIvEEEEE11do_allocateEmm
   fun:_ZN6thrust2mr9allocatorIiNS_26device_ptr_memory_resourceINS_6system4cuda6detail20cuda_memory_resourceIXadL_Z10cudaMallocEEXadL_Z8cudaFreeEENS_8cuda_cub7pointerIvEEEEEEE8allocateEm
   fun:_ZZN6thrust6detail16allocator_traitsINS_16device_allocatorIiEEE8allocateERS3_mEN19workaround_warnings8allocateES5_m
   fun:_ZN6thrust6detail16allocator_traitsINS_16device_allocatorIiEEE8allocateERS3_m
   fun:_ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE8allocateEm
   fun:_ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE17allocate_and_copyINS0_15normal_iteratorIPKiEEEEvmT_SA_RNS0_18contiguous_storageIiS3_EE
   fun:_ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE10range_initINS0_15normal_iteratorIPKiEEEEvT_SA_NS_27random_access_traversal_tagE
   fun:_ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE10range_initINS0_15normal_iteratorIPKiEEEEvT_SA_
   fun:_ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC1IiSaIiEEERKNS1_IT_T0_EE
   fun:_ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC1IiSaIiEEERKNS_11host_vectorIT_T0_EE
   fun:_ZN6vector11test_methodEv
   fun:_ZL14vector_invokerv
   fun:_ZN5boost6detail8function22void_function_invoker0IPFvvEvE6invokeERNS1_15function_bufferE
   obj:/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.74.0
   fun:_ZN5boost17execution_monitor13catch_signalsERKNS_8functionIFivEEE
   fun:_ZN5boost17execution_monitor7executeERKNS_8functionIFivEEE
   fun:_ZN5boost17execution_monitor8vexecuteERKNS_8functionIFvvEEE
   fun:_ZN5boost9unit_test19unit_test_monitor_t21execute_and_translateERKNS_8functionIFvvEEEm
   obj:/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.74.0
   obj:/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.74.0
   fun:_ZN5boost9unit_test9framework3runEmb
   fun:_ZN5boost9unit_test14unit_test_mainEPFbvEiPPc
   fun:main

【讨论】:

以上是关于如何结合使用thrust和valgrind来检测内存泄漏?的主要内容,如果未能解决你的问题,请参考以下文章

使用 Valgrind 工具如何检测尝试访问 0x0 地址的对象?

推力矢量化搜索:有效结合 lower_bound 和 binary_search 来找到位置和存在

Unix下C程序内存泄露检测工具:valgrind的安装使用

Valgrind C++ 内存泄漏检测

Valgrind C++ 内存泄漏检测

使用 Valgrind 检测 C++ 内存泄漏