从任何外部库中增加 tf.data.Dataset 元素(在我的情况下为albumentations)

Posted

技术标签:

【中文标题】从任何外部库中增加 tf.data.Dataset 元素(在我的情况下为albumentations)【英文标题】:Augmenting tf.data.Dataset elements from any external library( albumentations in my case) 【发布时间】:2020-12-16 12:41:26 【问题描述】:

我正在创建一个映射函数,它扩充数据集中的每个图像,然后由 tf.numpy_function 包装以创建一个 tensorflow 操作。现在这个 tensorflow op 被传递给 tf.data.Dataset.map 函数

我遇到了一种奇怪的行为,因为代码有时会运行,有时会显示错误。


transformations = Compose([
            Rotate(limit=40),
            RandomBrightness(limit=0.1),
            JpegCompression(quality_lower=85, quality_upper=100, p=0.5),
            HueSaturationValue(hue_shift_limit=20, sat_shift_limit=30, val_shift_limit=20, p=0.5),
            RandomContrast(limit=0.2, p=0.5),
            HorizontalFlip(),
        ])

def aug(image):
    aug_img = transformations(image=image)['image']
    aug_img = tf.image.convert_image_dtype(aug_img, 'float32')
    return aug_img


def tf_augment(image,label):
    aug_img = tf.numpy_function(func = aug, inp =[image], Tout =tf.float32)
    return aug_img,label


augmented_ds = data.batch(10).map(tf_augment,num_parallel_calls=AUTOTUNE)
it = iter(augmented_ds)
batch = next(it)
images,labels = batch
for image,label in zip(images,labels):
    show_image(image,label)

tf.data.Dataset 对象 data 包含 image,label 元组。我已经对data 进行了预处理,以便每个图像都具有相同的大小,这样做是为了确保data 可以被批处理。 aug 是一个增广函数,它从data 中获取image

所以,在我的例子中,aug 是映射函数。现在 aug 必须先转换为 tensorflow op,然后再将映射应用到 data

tf_augment 是将传递给data.map 方法的张量流操作

为了矢量化映射,我想先批量处理data,然后再应用映射。 所以,我先使用data.batch.map,然后再应用map

现在是最后一段代码

augmented_ds = data.batch(10).map(tf_augment,num_parallel_calls=AUTOTUNE)
it = iter(augmented_ds)
batch = next(it)
images,labels = batch
for image,label in zip(images,labels):
    show_image(image,label)

有时会运行,有时会抛出错误。

它抛出的错误是

UnknownError                              Traceback (most recent call last)
~\anaconda3\envs\tf23\lib\site-packages\tensorflow\python\eager\context.py in execution_mode(mode)
   2101       ctx.executor = executor_new
-> 2102       yield
   2103     finally:

~\anaconda3\envs\tf23\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py in _next_internal(self)
    754         # handles execute on the same device as where the resource is placed.
--> 755         ret = gen_dataset_ops.iterator_get_next(
    756             self._iterator_resource,

~\anaconda3\envs\tf23\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py in iterator_get_next(iterator, output_types, output_shapes, name)
   2609     except _core._NotOkStatusException as e:
-> 2610       _ops.raise_from_not_ok_status(e, name)
   2611     except _core._FallbackException:

~\anaconda3\envs\tf23\lib\site-packages\tensorflow\python\framework\ops.py in raise_from_not_ok_status(e, name)
   6842   # pylint: disable=protected-access
-> 6843   six.raise_from(core._status_to_exception(e.code, message), None)
   6844   # pylint: enable=protected-access

~\anaconda3\envs\tf23\lib\site-packages\six.py in raise_from(value, from_value)

UnknownError: error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-k8sx3e60\opencv\modules\imgproc\src\imgwarp.cpp:2594: error: (-215:Assertion failed) src.cols > 0 && src.rows > 0 in function 'cv::warpAffine'

Traceback (most recent call last):

  File "C:\Users\aksha\anaconda3\envs\tf23\lib\site-packages\tensorflow\python\ops\script_ops.py", line 244, in __call__
    ret = func(*args)

  File "C:\Users\aksha\anaconda3\envs\tf23\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 302, in wrapper
    return func(*args, **kwargs)

  File "<ipython-input-68-80185d06bd35>", line 2, in aug
    aug_img = transformations(image=image)['image']

  File "C:\Users\aksha\anaconda3\envs\tf23\lib\site-packages\albumentations\core\composition.py", line 176, in __call__
    data = t(force_apply=force_apply, **data)

  File "C:\Users\aksha\anaconda3\envs\tf23\lib\site-packages\albumentations\core\transforms_interface.py", line 87, in __call__
    return self.apply_with_params(params, **kwargs)

  File "C:\Users\aksha\anaconda3\envs\tf23\lib\site-packages\albumentations\core\transforms_interface.py", line 100, in apply_with_params
    res[key] = target_function(arg, **dict(params, **target_dependencies))

  File "C:\Users\aksha\anaconda3\envs\tf23\lib\site-packages\albumentations\augmentations\transforms.py", line 526, in apply
    return F.rotate(img, angle, interpolation, self.border_mode, self.value)

  File "C:\Users\aksha\anaconda3\envs\tf23\lib\site-packages\albumentations\augmentations\functional.py", line 70, in wrapped_function
    result = func(img, *args, **kwargs)

  File "C:\Users\aksha\anaconda3\envs\tf23\lib\site-packages\albumentations\augmentations\functional.py", line 202, in rotate
    return warp_fn(img)

  File "C:\Users\aksha\anaconda3\envs\tf23\lib\site-packages\albumentations\augmentations\functional.py", line 188, in __process_fn
    img = process_fn(img, **kwargs)

cv2.error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-k8sx3e60\opencv\modules\imgproc\src\imgwarp.cpp:2594: error: (-215:Assertion failed) src.cols > 0 && src.rows > 0 in function 'cv::warpAffine'



     [[node PyFunc]] [Op:IteratorGetNext]

During handling of the above exception, another exception occurred:

UnknownError                              Traceback (most recent call last)
<ipython-input-73-82392f6b5110> in <module>
      1 augmented_ds = resized_ds.batch(10).map(tf_augment,num_parallel_calls=AUTOTUNE)
      2 it = iter(augmented_ds)
----> 3 batch = next(it)
      4 images,labels = batch
      5 images.shape

~\anaconda3\envs\tf23\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py in __next__(self)
    734 
    735   def __next__(self):  # For Python 3 compatibility
--> 736     return self.next()
    737 
    738   def _next_internal(self):

~\anaconda3\envs\tf23\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py in next(self)
    770   def next(self):
    771     try:
--> 772       return self._next_internal()
    773     except errors.OutOfRangeError:
    774       raise StopIteration

~\anaconda3\envs\tf23\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py in _next_internal(self)
    762         return self._element_spec._from_compatible_tensor_list(ret)  # pylint: disable=protected-access
    763       except AttributeError:
--> 764         return structure.from_compatible_tensor_list(self._element_spec, ret)
    765 
    766   @property

~\anaconda3\envs\tf23\lib\contextlib.py in __exit__(self, type, value, traceback)
    129                 value = type()
    130             try:
--> 131                 self.gen.throw(type, value, traceback)
    132             except StopIteration as exc:
    133                 # Suppress StopIteration *unless* it's the same exception that

~\anaconda3\envs\tf23\lib\site-packages\tensorflow\python\eager\context.py in execution_mode(mode)
   2103     finally:
   2104       ctx.executor = executor_old
-> 2105       executor_new.wait()
   2106 
   2107 

~\anaconda3\envs\tf23\lib\site-packages\tensorflow\python\eager\executor.py in wait(self)
     65   def wait(self):
     66     """Waits for ops dispatched in this executor to finish."""
---> 67     pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle)
     68 
     69   def clear_error(self):

UnknownError: error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-k8sx3e60\opencv\modules\imgproc\src\imgwarp.cpp:2594: error: (-215:Assertion failed) src.cols > 0 && src.rows > 0 in function 'cv::warpAffine'

Traceback (most recent call last):

  File "C:\Users\aksha\anaconda3\envs\tf23\lib\site-packages\tensorflow\python\ops\script_ops.py", line 244, in __call__
    ret = func(*args)

  File "C:\Users\aksha\anaconda3\envs\tf23\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 302, in wrapper
    return func(*args, **kwargs)

  File "<ipython-input-68-80185d06bd35>", line 2, in aug
    aug_img = transformations(image=image)['image']

  File "C:\Users\aksha\anaconda3\envs\tf23\lib\site-packages\albumentations\core\composition.py", line 176, in __call__
    data = t(force_apply=force_apply, **data)

  File "C:\Users\aksha\anaconda3\envs\tf23\lib\site-packages\albumentations\core\transforms_interface.py", line 87, in __call__
    return self.apply_with_params(params, **kwargs)

  File "C:\Users\aksha\anaconda3\envs\tf23\lib\site-packages\albumentations\core\transforms_interface.py", line 100, in apply_with_params
    res[key] = target_function(arg, **dict(params, **target_dependencies))

  File "C:\Users\aksha\anaconda3\envs\tf23\lib\site-packages\albumentations\augmentations\transforms.py", line 526, in apply
    return F.rotate(img, angle, interpolation, self.border_mode, self.value)

  File "C:\Users\aksha\anaconda3\envs\tf23\lib\site-packages\albumentations\augmentations\functional.py", line 70, in wrapped_function
    result = func(img, *args, **kwargs)

  File "C:\Users\aksha\anaconda3\envs\tf23\lib\site-packages\albumentations\augmentations\functional.py", line 202, in rotate
    return warp_fn(img)

  File "C:\Users\aksha\anaconda3\envs\tf23\lib\site-packages\albumentations\augmentations\functional.py", line 188, in __process_fn
    img = process_fn(img, **kwargs)

cv2.error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-k8sx3e60\opencv\modules\imgproc\src\imgwarp.cpp:2594: error: (-215:Assertion failed) src.cols > 0 && src.rows > 0 in function 'cv::warpAffine'



     [[node PyFunc]]

我还附上了 colab 笔记本的链接。请复制它。 colab

【问题讨论】:

【参考方案1】:

您可以尝试在映射后使用批处理。它对我有用

ds_alb = resized_ds.map(partial(process_data, img_size=120),
              num_parallel_calls=AUTOTUNE,deterministic=False).batch(32)

【讨论】:

以上是关于从任何外部库中增加 tf.data.Dataset 元素(在我的情况下为albumentations)的主要内容,如果未能解决你的问题,请参考以下文章

规范化 tf.data.Dataset

如何更改 tf.data.Dataset 中数据的 dtype?

如何在 tf 2.1.0 中创建 tf.data.Dataset 的训练、测试和验证拆分

Tensorflow:连接多个tf.Dataset非常慢

TensorFlow - tf.data.Dataset读取大型HDF5文件

使用 tf.data.dataset 为序列模型创建数据生成器