markdown 如何纠正计算能力

Posted 2021-05-05
tags:
篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了markdown 如何纠正计算能力相关的知识，希望对你有一定的参考价值。
# How to fix the first init time issue with TF
As NVIDIA GPUs evolve to support new features, the instruction set architecture naturally changes. Because applications must run on multiple generations of GPUs, the NVIDIA compiler tool chain supports compiling for multiple architectures in the same application executable or library. CUDA also relies on the PTX virtual GPU ISA to provide forward compatibility, so that already deployed applications can run on future GPU architectures.

If the library is not compilled for the required architecture, CUDA makes a JIT compillation to adapt the library, and stores it in the cache. In some cases, the cache could be deactivated, and we have to activate it if we don't want to wait each time we run the application.

## How to add needed architectures
The cuda compiler _nvcc_ has some parameters to define the architecture:

```
... -gencode arch=compute_CC,code=[sm_CC,compute_CC] ...
```
where **CC** is the Compute Capability of the Nvidia GPU board that you are going to use. You can find the correct **CC** value finding the target GPU in the [CUDA GPU web page](https://developer.nvidia.com/cuda-gpus). 

If the Compute Capability supported by the GPU is (for example) **6.1**, the correct option is:

```
... -gencode arch=compute_61,code=[sm_61,compute_61] ...
```

If you are compiling a software that will run on different machines and you do not know exactly which GPU is installed on them, you can generate an application that supports more than one Nvidia GPU technology:

```
ARCH= -gencode arch=compute_30,code=[sm_30,compute_30] \
      -gencode arch=compute_35,code=[sm_35,compute_35] \
      -gencode arch=compute_50,code=[sm_50,compute_50] \
      -gencode arch=compute_52,code=[sm_52,compute_52] \
      -gencode arch=compute_61,code=[sm_61,compute_61]
 
nvcc $(ARCH) [other nvcc options]
```

## How to compile Tensorflow with desired Comute Capabilities (CMake Version)
Until the tensorflow v1.9 you can use CMake build you can define the desired versions in the cmake script.
You can open the file 'tensorflow/tensorflow/contrib/cmake/CMakeLists.txt', and correct the '_CUDA_NVCC_FLAGS_' and '_TF_EXTRA_CUDA_CAPABILITIES_' variables to include the desired versions. You also need to correct the definition of '_TF_CUDA_CAPABILITIES_' for the generated _cuda_config.h_ file 

```
...

  # by default we assume compute cabability 3.5 and 5.2. If you change this change it in
  # CUDA_NVCC_FLAGS and cuda_config.h below
  set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS};-gencode arch=compute_30,code=\"sm_30,compute_30\";-gencode arch=compute_35,code=\"sm_35,compute_35\";-gencode arch=compute_52,code=\"sm_52,compute_52\")
  set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS};--include-path ${PROJECT_BINARY_DIR}/$\{build_configuration\};--expt-relaxed-constexpr)
  set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS};-ftz=true)  # Flush denormals to zero
  set(CUDA_INCLUDE ${CUDA_TOOLKIT_TARGET_DIR} ${CUDA_TOOLKIT_TARGET_DIR}/extras/CUPTI/include)
  include_directories(${CUDA_INCLUDE})
  if (WIN32)
    add_definitions(-DGOOGLE_CUDA=1 -DTF_EXTRA_CUDA_CAPABILITIES=3.0,3.5,5.2)
  else (WIN32)
    # Without these double quotes, cmake in Linux makes it "-DTF_EXTRA_CUDA_CAPABILITIES=3.0, -D3.5, -D5.2" for cc, which incurs build breaks
    add_definitions(-DGOOGLE_CUDA=1 -D"TF_EXTRA_CUDA_CAPABILITIES=3.0,3.5,5.2")
  endif (WIN32)
  
...

  # create cuda_config.h
  FILE(WRITE ${tensorflow_source_dir}/third_party/gpus/cuda/cuda_config.h
    "#ifndef CUDA_CUDA_CONFIG_H_\n"
    "#define CUDA_CUDA_CONFIG_H_\n"
    "#define TF_CUDA_CAPABILITIES CudaVersion(\"3.0\"),CudaVersion(\"3.5\"),CudaVersion(\"5.2\")\n"
    "#define TF_CUDA_VERSION \"64_${short_CUDA_VER}\"\n"
    "#define TF_CUDNN_VERSION \"64_${tensorflow_CUDNN_VERSION}\"\n"
    "#define TF_CUDA_TOOLKIT_PATH \"${CUDA_TOOLKIT_ROOT_DIR}\"\n"
    "#endif  // CUDA_CUDA_CONFIG_H_\n"
  )
  
...
```

## Sources
* [CUDA Pro Tip: Understand Fat Binaries and JIT Caching](https://devblogs.nvidia.com/cuda-pro-tip-understand-fat-binaries-jit-caching/)
* [\[Tutorial CUDA\] Nvidia GPU: CUDA Compute Capability](https://www.myzhar.com/blog/tutorials/tutorial-nvidia-gpu-cuda-compute-capability/)
* [(SO) Tensorflow takes >1 min on first run on video card with 5.0 compute capability](https://stackoverflow.com/a/36871892/1407528)
* [How to enable the support of cuda compute capability 3.0 GPU in windows](https://github.com/tensorflow/tensorflow/issues/6001)
以上是关于markdown 如何纠正计算能力的主要内容，如果未能解决你的问题，请参考以下文章