kubeflow ParallelFor 使用之前的 containerop 输出

Posted 2023-03-29

技术标签:

【中文标题】kubeflow ParallelFor 使用之前的 containerop 输出【英文标题】：kubeflow ParallelFor using the previous containerop output 【发布时间】：2020-04-02 13:52:37 【问题描述】：

我可以使用

创建一个静态 for 循环

with dsl.ParallelFor([1,2,3]) as item:
   ....

如何使用container_op.output 作为ParallelFor 的输入？假设第一个容器输出一个整数n，然后我想运行ParallelFor n次。

这样的尝试不起作用：

container_op = ContainerOp(...)
with dsl.ParallelFor(container_op.output) as item:
   ....

我正在尝试刺激并行 python range(n) 函数。

【问题讨论】：

【参考方案1】：

对支持withItem（静态循环）和withParams（动态循环）的更改分多个部分完成，但它们现在都可用。参考PR。

确保您的 KPF 版本为 0.1.31 或更高版本。

可以循环遍历之前的container_op 的输出，如下所示

echo_op = dsl.ContainerOp(
        name='echo',
        image='library/bash:4.4.23',
        command=['sh', '-c'],
        arguments=['echo "[1,2,3]"> /tmp/output.txt'],
        file_outputs='output': '/tmp/output.txt')

with dsl.ParallelFor(echo_op.output) as item:
        iterate_op = dsl.ContainerOp(
        name='iterate',
        image='library/bash:4.4.23',
        command=['sh', '-c'],
        arguments=[f"echo item > /tmp/output.txt"],
        file_outputs='output': '/tmp/output.txt')

确保您的输出 YAML 看起来像这样：

        name: for-loop-for-loop-3c29048d-1
        template: for-loop-for-loop-3c29048d-1
        withParam: 'tasks.echo.outputs.parameters.echo-output'

【讨论】：

谢谢。是否可以在ParallelFor 中压缩两个输出并将它们提供给第二个组件？

以上是关于kubeflow ParallelFor 使用之前的 containerop 输出的主要内容，如果未能解决你的问题，请参考以下文章