无法从 Python 中的容器中删除多个 Blob

Posted

技术标签:

【中文标题】无法从 Python 中的容器中删除多个 Blob【英文标题】:Unable to delete Multiple Blobs from Container in Python 【发布时间】:2021-02-19 08:18:14 【问题描述】:

我正在尝试删除容器中的 blob。每个容器至少有 1500-2000 个 blob。每个容器包含 jpeg 文件和一个 mp4 文件。如果 mp4 文件存在,那么我将删除该特定容器内的 blob。 每次尝试执行 content.delete_blobs(*blobsToDelete) 时,都会出现以下异常:

Exception in Non AI : Traceback (most recent call last):
  File "deleteblobs.py", line 278, in BlobsToDeleteeNonAI
    content.delete_blobs(*blobsToDelete)
  File "C:\Users\hp\AppData\Local\Programs\Python\Python38-32\lib\site-packages\azure\core\tracing\decorator.py", line 83, in wrapper_use_tracer
    return func(*args, **kwargs)
  File "C:\Users\hp\AppData\Local\Programs\Python\Python38-32\lib\site-packages\azure\storage\blob\_container_client.py", line 1194, in delete_blobs
    return self._batch_send(*reqs, **options)
  File "C:\Users\hp\AppData\Local\Programs\Python\Python38-32\lib\site-packages\azure\storage\blob\_shared\base_client.py", line 304, in _batch_send
    raise error
azure.storage.blob._shared.response_handlers.PartialBatchErrorException: There is a partial failure in the batch operation.

这是我的代码的样子:

def BlobsToDeleteeNonAI():
        blobsToDelete = []
        #Will delete all the photos except the video.
        try:
                for containerName in NonAICandidates:
                        try:
                                mp4Found = 0
                                content = blob_service_client.get_container_client(str(containerName))
                                for blobs in content.list_blobs():
                                        print("\n"+blobs.name)
                                        #file.write("\n" +blobs.name)
                                        if(blobs.name.endswith(".jpeg")):   #str(blobs.size)
                                                blobsToDelete.append(blobs.name)

                                        if(blobs.name.endswith(".mp4")):
                                                mp4Found = 1
                                                file.write("\nMP4 File Name : " +str(blobs.name))
                                #Will only Delete if and only if the Video File is Present
                                if(mp4Found == 1):
                                        #DeleteCodeHere
                                        
                                        file.write("\n Mp4 Found : " +str(mp4Found) + " for " +str(containerName))
                                        #file.write("\n Blobs to Delete : "+str(blobsToDelete))
                                        
                                        content.delete_blobs(*blobsToDelete)
                                        blobsToDelete.clear()
                                        file.write("\n Blobs Deleted for : " +str(containerName))
                                else:
                                        file.write("\nMp4 File Not found for Non AI Candidate : " +str(containerName) + ". Cannot Perform Deletion Operation.");
                                                
                                           
                        except:
                                file.write("\nException in Non AI : " +str(traceback.format_exc()))
                                blobsToDelete.clear()
        except:
                 file.write("\nException : " +str(traceback.format_exc()))




if __name__ == "__main__":

        NonAICandidates = ['container1', 'container2', 'container3', 'container4', 'container5', 'container6', ....]


        BlobsToDeleteeNonAI()

实现有什么问题,还是有其他问题阻止我删除 blob?

【问题讨论】:

【参考方案1】:

原因是单个批次只能支持256 sub-requests。而在您的容器中,至少有 1500-2000 个 blob 需要删除,当您尝试在一个 delete_blobs 方法中删除这些 blob 时,超出了 256 个限制。

您应该修改您的代码,在一个 delete_blobs 方法中,只删除 1 到 256 个 blob。这是一个示例:

#Will only Delete if and only if the Video File is Present
if(mp4Found == 1):
    blobs_lenth=len(blobsToDelete)

    start=0
    end=256

    while end<=blobs_lenth:
         #each time, delete 256 blobs at most
         container_client.delete_blobs(*blobsToDelete[start:end])
         start = start + 256
         end = end + 256

         if start < blobs_lenth and end > blobs_lenth:
            container_client.delete_blobs(*blobsToDelete[start:blobs_lenth])

【讨论】:

感谢您的帮助。进行相关更改后,我可以从包含超过 256 个 blob 的容器中删除 blob。再次感谢。 FYI - 如果使用上面的代码,如果长度 如果我们在列表中有一些 blob 名称但实际上它在容器中不存在,如何避免异常。我们得到批处理操作中存在部分失败。【参考方案2】:
if(mp4Found == 1):
    blobs_lenth=len(blobsToDelete)

    if blobs_lenth <= 256:
        container_client.delete_blobs(*blobsToDelete) 

    else:
        start=0
        end=256

        while end<=blobs_lenth:
             #each time, delete 256 blobs at most
             container_client.delete_blobs(*blobsToDelete[start:end])
             start = start + 256
             end = end + 256

             if start < blobs_lenth and end > blobs_lenth:
                container_client.delete_blobs(*blobsToDelete[start:blobs_lenth])

【讨论】:

即使 blob 的长度小于 256 也会处理这种情况【参考方案3】:

我发现这个示例可以简单地写成如下。工作正常。

if(mp4Found == 1):
    blobs_length=len(blobsToDelete)

    for i in range(0, blobs_length, 256):
        container_client.delete_blobs(*blobsToDelete[i: i+256])

【讨论】:

以上是关于无法从 Python 中的容器中删除多个 Blob的主要内容,如果未能解决你的问题,请参考以下文章

Powershell脚本从blob容器的子文件夹中删除文件

列出并恢复软删除的 blob - azure python

从容器中删除 blob 时如何在 python 中对 Azure PartialBatchErrorException 进行异常处理

从 blob 容器中删除文件夹

跨多个容器批量删除 blob

Dot.Net Core 中 Azure blob 容器中的软删除 blob 文件