DALI Cookbook by Eric
Posted songyuc
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了DALI Cookbook by Eric相关的知识,希望对你有一定的参考价值。
Installation
Installation — NVIDIA DALI documentation
Basic knowledge
链接形式:.so
文件
例如:
customdummy/build/libcustomdummy.so
。
1. Defining the pipeline: @pipeline_def
对于定义Pipeline,我们根据DALI文档的示例总结了下面的规则:
- 在pipeline定义中,仅推荐使用
dali.fn
的算子、或由dali.fn
构成的函数; - 除规则[1.]之外,简单的四则运算符也是可以使用的,包括
+, -, *, /
; - 对于控制流,无法直接使用
if
语句和for
语句,需要使用其它方式进行等效实现[DALI-doc/Conditional-Like_Execution_and_Masking];
1.1 fn.readers.coco
:读取COCO数据
参数说明:
file_root
:COCO图像根目录,包含.jpg
文件的目录;annotations_file
:JSON标注文件路径;
返回值说明:
Dali文档:nvidia.dali.fn.readers.coco — NVIDIA DALI 1.18.0 documentation
fn.readers.coco
的返回值如下:
images, bounding_boxes, labels, ((polygons, vertices) | (pixelwise_masks)), (image_ids)
示例:
images, bboxes, labels = fn.readers.coco(
file_root="coco_root/train2017",
annotations_file="coco_root/annotations/instances_train2017.json",
skip_empty=True, # 跳过不包含目标实例的样本
ratio=True,
ltrb=True,
random_shuffle=False,
shuffle_after_epoch=True, # 两个参数联合使用实现 data shuffling
name="Reader")
Note
在使用random_shuffle=False,shuffle_after_epoch=True
来随机化数据时,readers.coco会在每次epoch结束之后进行shuffle,也就是 train_loader遍历一次之后才会进行随机化,且每次运行时的随机种子是固定的,不同运行时每次的图像序列是相同的。
1.2 fn.decoders.image
:解码图像数据
images = fn.decoders.image(images, device="mixed")
Note
在TensorFlow_YOLOv4代码使用的是images = dali.fn.decoders.image(inputs, device=device, output_type=dali.types.RGB)
,指定了output_type参数,经过查看文档后发现:output_type的默认值是DALIImageType.RGB
;
经过测试:assert types.RGB == DALIImageType.RGB and types.RGB is DALIImageType.RGB
,发现这两个实际上是同一个数据类型,所以我们在这里就省略了output_type参数。
2. Customizing operator
在自定义DALI算子时,我们需要时用到CUDA(Compute Unified Device Architecture)和C++;
编译工具:CMake
自定义算子步骤:
- 在头文件中声明算子定义;
- 实现接口函数;
2.1 Operator Definition (header)
#ifndef EXAMPLE_DUMMY_H_
#define EXAMPLE_DUMMY_H_
#include <vector>
#include "dali/pipeline/operator/operator.h" // 声明dali的头文件
namespace other_ns
template <typename Backend>
class Dummy : public ::dali::Operator<Backend>
public:
inline explicit Dummy(const ::dali::OpSpec &spec) :
::dali::Operator<Backend>(spec)
virtual inline ~Dummy() = default;
Dummy(const Dummy&) = delete;
Dummy& operator=(const Dummy&) = delete;
Dummy(Dummy&&) = delete;
Dummy& operator=(Dummy&&) = delete;
protected:
bool CanInferOutputs() const override
return true;
bool SetupImpl(std::vector<::dali::OutputDesc> &output_desc,
const ::dali::workspace_t<Backend> &ws) override
const auto &input = ws.template Input<Backend>(0);
output_desc.resize(1);
output_desc[0] = input.shape(), input.type();
return true;
void RunImpl(::dali::workspace_t<Backend> &ws) override;
;
// namespace other_ns
#endif // EXAMPLE_DUMMY_H_
3. Debugging DALI
遍历TensorList
TensorList
是非dense结构:tensor_list.at()
当TensorList
是非dense的结构时,使用tensor_list.at(idx)
来遍历每一个张量数据;
4. Troubleshooting
4.1 出现错误:[/opt/dali/dali/util/mmaped_file.cc:105] File mapping failed: /train2017/000000000285.jpg
在学习dali时,遇到过这样一个error:
Traceback (most recent call last):
File "/xxx/test/dali/validate_random_shuffle2.py", line 63, in <module>
main()
File "/xxx/test/dali/validate_random_shuffle2.py", line 34, in main
train_loader = DALIGenericIterator(
File "/xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/plugin/pytorch.py", line 196, in __init__
self._first_batch = DALIGenericIterator.__next__(self)
File "/xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/plugin/pytorch.py", line 213, in __next__
outputs = self._get_outputs()
File "/xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/plugin/base_iterator.py", line 297, in _get_outputs
outputs.append(p.share_outputs())
File "/xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/pipeline.py", line 1002, in share_outputs
return self._pipe.ShareOutputs()
RuntimeError: Critical error in pipeline:
Error when executing CPU operator readers__COCO encountered:
[/opt/dali/dali/util/mmaped_file.cc:105] File mapping failed: /train2017/000000000285.jpg
Stacktrace (10 entries):
[frame 0]: /xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/libdali.so(+0x847ff) [0x7f09562857ff]
[frame 1]: /xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/libdali.so(+0x1b0c27) [0x7f09563b1c27]
[frame 2]: /xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/libdali.so(dali::FileStream::Open(std::string const&, bool, bool)+0x110) [0x7f09563a2800]
[frame 3]: /xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/libdali_operators.so(dali::FileLabelLoader::ReadSample(dali::ImageLabelWrapper&)+0x26a) [0x7f0931c718ea]
[frame 4]: /xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/libdali_operators.so(+0x31ccc41) [0x7f0931ccec41]
...
Current pipeline object is no longer valid.
主要可以关注:
[/opt/dali/dali/util/mmaped_file.cc:105] File mapping failed: /train2017/00000000xxxx.jpg
[frame 3]: /xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/libdali_operators.so(dali::FileLabelLoader::ReadSample(dali::ImageLabelWrapper&)+0x26a) [0x7f0931c718ea]
可以看到,很可能是数据集读取出现了问题,这里是因为我们把fn.readers.coco.file_root
的路径写错了;
以上是关于DALI Cookbook by Eric的主要内容,如果未能解决你的问题,请参考以下文章