python 如何对两个数组做差处理

Posted 2023-05-08

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了python 如何对两个数组做差处理相关的知识，希望对你有一定的参考价值。

参考技术A 转成集合，通过集合的求差方法，然后将结果再转成list 参考技术B Python中的列表中的元素不能直接相加减。
最佳的方式是将列表转换成Python中的科学计算包numpy包的array类型，再进行加减。

import numpy as npa = np.array([1,2,3,4])b = np.array([7,8,9,10])s = a + b

如何对大型 3d 图像堆栈使用多处理？ Python

【中文标题】如何对大型 3d 图像堆栈使用多处理？ Python【英文标题】：How to use multiprocessing for a big 3d image stack? python 【发布时间】：2021-09-02 08:54:04 【问题描述】：

我有一个 3d 图像堆栈（4000×2048×2048），我想在每个 2d 数组（2048×2048）中做一些操作，例如。高斯滤波、图像增强、resize img ...

import numpy as np
from tifffile import imread,imwrite
import multiprocessing as mp
import cv2

def gaussian_blur_2d(img):
    blur = cv2.GaussianBlur(img,(5,5),0) 
    return blur

file_path = "F:\\Ctest\\123.tif"
img = imread(file_path)
for i in range(0,img.shape[0]):
    img[i,:,:] = gaussian_blur_2d(img[i,:,:])

如何通过使用多处理来加速 for 循环？我的想法是将原始图像堆栈分成四个或八个部分，并使用pool.map 到拆分堆栈。但是我如何使用拆分处理结果来获得最终的完整堆栈。我不想写拆分堆栈.这会增加额外的 IO 时间。根据我的经验，当拆分堆栈太大时，会导致pool.map 中的返回错误。

另一方面，我尝试将多维数组粘贴到 mp.Array 中，这给了我 TypeError: only size-1 数组可以转换为 Python 标量。

【问题讨论】：

真的都适合内存吗？（约 16 gigs，8 位灰度）如果是这样，我会在每个进程中从 multiprocessing.shared_memory 对象构造 numpy 数组。你的大问题几乎肯定是有效地将数据传入和传出子进程。跨度> 【参考方案1】：

正如我在 cmets 中提到的，让多个工作进程之间可访问所有数据是这里最大的挑战，因为使用多处理的关键租户之一是进程之间通常不共享内存。因此，我们必须明确告诉操作系统我们想要访问进程之间“共享”的一块内存，并使用该内存块创建我们的 numpy 数组。除此之外，它只是一些非常标准的多处理内务处理，并且在其他教程和示例中得到了很好的探索。

import numpy as np
from multiprocessing import Process, shared_memory, Queue, cpu_count
from queue import Empty
import cv2

class STOPFLAG: pass #a simple flag to tell the worker to stop

def worker_process(in_q, shm_name):
    shm = shared_memory.SharedMemory(name=shm_name) #create from the existing one made by the parent process
    img_stack = np.ndarray([4000, 2048, 2048], dtype="uint8", buffer=shm.buf) #attach a numpy array to the memory object
    while True: #until the worker runs out of work
        try:
            task = in_q.get(1) #don't wait forever on anything if you can help it.
        except Empty: #multiprocessing.Queue uses an exception template from the queue library
            print("assuming all tasks are done. worker exiting...") #assume waiting for a while means no more tasks (we shouldn't hit this, but it could prevent problems in the child if a crash happens elsewhere)
            break
        if isinstance(task, STOPFLAG):
            print("got stop flag. worker exiting...")
            break
        
        #process the image slice (no mutexes are needed because no two workers will ever get the same index to work on at the same time)
        img_stack[task] = cv2.GaussianBlur(img_stack[task],(5,5),0) 
        
    shm.close() #cleanup after yourself (close the local copy. This does not close the copy in the other processes)

if __name__ == "__main__": #this is needed with multiprocessing

    #create shared memory space where numpy will work from
    shm = shared_memory.SharedMemory(create=True, size=4000*2048*2048) #OS may have a hard time allocating this memory block because it's so big...
    #create the numpy array from the allocated memory
    img_stack = np.ndarray([4000, 2048, 2048], dtype="uint8", buffer=shm.buf)
    
    #Here is where you would load the image data onto the img_stack array. It will start out with whatever random data was previously in ram similar to numpy.empty.
    
    #create a queue to send workers tasks (image index to work on)
    in_q = Queue()
    
    #create a couple worker processes
    processes = [Process(target=worker_process, args = (in_q, shm.name)) for _ in range(cpu_count())]
    for p in processes:
        p.start()
    
    #fill up the task queue with image indices that need computation
    for i in range(4000):
        in_q.put(i)
        
    #send a stop signal for each worker
    for _ in processes:
        in_q.put(STOPFLAG())
        
    #wait for all children to finish
    for p in processes:
        p.join()
        
    #do something (save?) with the img_stack
    np.save("processed_images.npy", img_stack)
    
    shm.close() #cleanup
    shm.unlink() #unlink is called only once after the last instance has been "close()"d

【讨论】：

你的回答让我震惊。太奇妙了！作为一个菜鸟，我学到了很多东西，我衷心感谢你对我的好意。

以上是关于python 如何对两个数组做差处理的主要内容，如果未能解决你的问题，请参考以下文章

不用循环，python numpy 数组如何对每个元素进行操作？

如何在 Python 中集成两个一维数据数组？

如何对matlab plot生成的fig曲线图像进行去噪，平滑处理。

python 中两个数组如何合并为一个数组。

python如何对excel数据进行处理

如何在 python 中用两个 2D 数组初始化一个 3D 数组？