DALI Cookbook by Eric

Posted songyuc

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了DALI Cookbook by Eric相关的知识,希望对你有一定的参考价值。

Installation

Installation — NVIDIA DALI documentation

Basic knowledge

链接形式:.so文件

例如:customdummy/build/libcustomdummy.so

1. Defining the pipeline: @pipeline_def

对于定义Pipeline,我们根据DALI文档的示例总结了下面的规则:

  • 在pipeline定义中,仅推荐使用dali.fn的算子、或由dali.fn构成的函数;
  • 除规则[1.]之外,简单的四则运算符也是可以使用的,包括+, -, *, /
  • 对于控制流,无法直接使用if语句和for语句,需要使用其它方式进行等效实现[DALI-doc/Conditional-Like_Execution_and_Masking]

1.1 fn.readers.coco:读取COCO数据

参数说明:

  • file_root:COCO图像根目录,包含.jpg文件的目录;
  • annotations_file:JSON标注文件路径;

返回值说明:

Dali文档:nvidia.dali.fn.readers.coco — NVIDIA DALI 1.18.0 documentation
fn.readers.coco的返回值如下:

images, bounding_boxes, labels, ((polygons, vertices) | (pixelwise_masks)), (image_ids)

示例:

images, bboxes, labels = fn.readers.coco(
	file_root="coco_root/train2017",
	annotations_file="coco_root/annotations/instances_train2017.json",
	skip_empty=True, # 跳过不包含目标实例的样本
	ratio=True,
	ltrb=True,
	random_shuffle=False,
	shuffle_after_epoch=True,  # 两个参数联合使用实现 data shuffling
	name="Reader")

Note
在使用random_shuffle=False,shuffle_after_epoch=True来随机化数据时,readers.coco会在每次epoch结束之后进行shuffle,也就是 train_loader遍历一次之后才会进行随机化,且每次运行时的随机种子是固定的,不同运行时每次的图像序列是相同的。

1.2 fn.decoders.image:解码图像数据

images = fn.decoders.image(images, device="mixed")

Note
在TensorFlow_YOLOv4代码使用的是images = dali.fn.decoders.image(inputs, device=device, output_type=dali.types.RGB),指定了output_type参数,经过查看文档后发现:output_type的默认值是DALIImageType.RGB
经过测试:assert types.RGB == DALIImageType.RGB and types.RGB is DALIImageType.RGB,发现这两个实际上是同一个数据类型,所以我们在这里就省略了output_type参数。

2. Customizing operator

在自定义DALI算子时,我们需要时用到CUDA(Compute Unified Device Architecture)和C++;
编译工具:CMake
自定义算子步骤:

  1. 在头文件中声明算子定义;
  2. 实现接口函数;

2.1 Operator Definition (header)

#ifndef EXAMPLE_DUMMY_H_
#define EXAMPLE_DUMMY_H_

#include <vector>

#include "dali/pipeline/operator/operator.h"	// 声明dali的头文件

namespace other_ns 

template <typename Backend>
class Dummy : public ::dali::Operator<Backend> 
 public:
  inline explicit Dummy(const ::dali::OpSpec &spec) :
    ::dali::Operator<Backend>(spec) 

  virtual inline ~Dummy() = default;

  Dummy(const Dummy&) = delete;
  Dummy& operator=(const Dummy&) = delete;
  Dummy(Dummy&&) = delete;
  Dummy& operator=(Dummy&&) = delete;

 protected:
  bool CanInferOutputs() const override 
    return true;
  

  bool SetupImpl(std::vector<::dali::OutputDesc> &output_desc,
                 const ::dali::workspace_t<Backend> &ws) override 
    const auto &input = ws.template Input<Backend>(0);
    output_desc.resize(1);
    output_desc[0] = input.shape(), input.type();
    return true;
  

  void RunImpl(::dali::workspace_t<Backend> &ws) override;
;

  // namespace other_ns

#endif  // EXAMPLE_DUMMY_H_

3. Debugging DALI

遍历TensorList

TensorList是非dense结构:tensor_list.at()

TensorList是非dense的结构时,使用tensor_list.at(idx)来遍历每一个张量数据;

4. Troubleshooting

4.1 出现错误:[/opt/dali/dali/util/mmaped_file.cc:105] File mapping failed: /train2017/000000000285.jpg

在学习dali时,遇到过这样一个error:

Traceback (most recent call last):
  File "/xxx/test/dali/validate_random_shuffle2.py", line 63, in <module>
    main()
  File "/xxx/test/dali/validate_random_shuffle2.py", line 34, in main
    train_loader = DALIGenericIterator(
  File "/xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/plugin/pytorch.py", line 196, in __init__
    self._first_batch = DALIGenericIterator.__next__(self)
  File "/xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/plugin/pytorch.py", line 213, in __next__
    outputs = self._get_outputs()
  File "/xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/plugin/base_iterator.py", line 297, in _get_outputs
    outputs.append(p.share_outputs())
  File "/xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/pipeline.py", line 1002, in share_outputs
    return self._pipe.ShareOutputs()
RuntimeError: Critical error in pipeline:
Error when executing CPU operator readers__COCO encountered:
[/opt/dali/dali/util/mmaped_file.cc:105] File mapping failed: /train2017/000000000285.jpg
Stacktrace (10 entries):
[frame 0]: /xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/libdali.so(+0x847ff) [0x7f09562857ff]
[frame 1]: /xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/libdali.so(+0x1b0c27) [0x7f09563b1c27]
[frame 2]: /xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/libdali.so(dali::FileStream::Open(std::string const&, bool, bool)+0x110) [0x7f09563a2800]
[frame 3]: /xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/libdali_operators.so(dali::FileLabelLoader::ReadSample(dali::ImageLabelWrapper&)+0x26a) [0x7f0931c718ea]
[frame 4]: /xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/libdali_operators.so(+0x31ccc41) [0x7f0931ccec41]
...

Current pipeline object is no longer valid.

主要可以关注:

  1. [/opt/dali/dali/util/mmaped_file.cc:105] File mapping failed: /train2017/00000000xxxx.jpg
  2. [frame 3]: /xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/libdali_operators.so(dali::FileLabelLoader::ReadSample(dali::ImageLabelWrapper&)+0x26a) [0x7f0931c718ea]

可以看到,很可能是数据集读取出现了问题,这里是因为我们把fn.readers.coco.file_root的路径写错了;

以上是关于DALI Cookbook by Eric的主要内容,如果未能解决你的问题,请参考以下文章

CSDN Cookbook by Eric

Python Cookbook by Eric

Excel Cookbook by Eric

Excel Cookbook by Eric

Markdown Cookbook by Eric

PDF Cookbook by Eric