Python:inotify、concurrent.futures - 如何添加现有文件

Posted

技术标签:

【中文标题】Python:inotify、concurrent.futures - 如何添加现有文件【英文标题】:Python: inotify, concurrent.futures - how to add exisiting files 【发布时间】:2017-11-15 15:22:30 【问题描述】:

我有一个使用inotify 模块和mulit-threading 处理文件的简单脚本:

import concurrent.futures

import inotify.adapters

def main():
    i = inotify.adapters.Inotify()

    i.add_watch(b'/data')

    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        try:
            for event in i.event_gen():
                if event is not None:
                    (header, type_names, watch_path, filename) = event
                    # inotify event: IN_CLOSE_WRITE
                    if header.mask == 8:
                        future = executor.submit(process, filename.decode('utf-8'))
                        future.add_done_callback(future_callback)
        finally:
            i.remove_watch(b'/data')

if __name__ == '__main__':
    main()

我遇到的问题是,在脚本实际启动之前,被监视的目录可以有很多文件。

我考虑过类似下面的示例,但在处理所有现有文件之前,它不会开始“生成”inotify 生成器,并且它也会错过在此期间创建的新事件:

import concurrent.futures

import inotify.adapters

def main():
    i = inotify.adapters.Inotify()

    i.add_watch(b'/data')

    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        files = os.listdir('/data')
        if files:
            for filename in files:
                future = executor.submit(run, filename)
                future.add_done_callback(future_callback)
        try:
            for event in i.event_gen():
                if event is not None:
                    (header, type_names, watch_path, filename) = event
                    # inotify event: IN_CLOSE_WRITE
                    if header.mask == 8:
                        future = executor.submit(process, filename.decode('utf-8'))
                        future.add_done_callback(future_callback)
        finally:
            i.remove_watch(b'/data')

if __name__ == '__main__':
    main()

有没有办法手动发送inotify 事件或将这些文件添加到i.event_gen() 生成器?

【问题讨论】:

也许您应该有一个单独的函数来处理您提交到池中的现有文件? 尽早创建生成器有帮助吗? event_gen = i.event_gen()... process existing files... for event in event_gen:... 顺便说一句,您的代码对我来说很好。我在处理现有文件时创建了一个文件,并看到了新文件的 inotify 事件。 根据这个inotify 模块,PyInotify 不再维护。我用了另一个。 嗯..,如果有很多文件(在我的情况下可能是数千个),它不会前进到for event in i.event_gen(),直到完成for filename in files: 【参考方案1】:

这是一个在其中一个工作器中处理旧文件的示例,允许在处理旧的现有文件时并行捕获新事件。作为记录,即使使用您的线性代码,我也没有遇到丢失事件的问题。

此外,PyInotify 模块“已失效且不再可用”。根据我使用的这个inotify module。

#!/usr/bin/env python3

import concurrent.futures
import inotify.adapters
import time
import os
from functools import partial


DIRECTORY='.'


def run(filename, suffix=''):
    time.sleep(1)
    return 'run: ' + filename + suffix


def process(filename):
    return run(filename, suffix=' (inotify)')


def future_callback(fut):
    print('future_callback: ' + fut.result())


def do_directory(executor):
    fn = partial(run, suffix=' (dir list)')
    for filename in os.listdir(DIRECTORY):
        future = executor.submit(fn, filename)
        future.add_done_callback(future_callback)


def main():
    i = inotify.adapters.Inotify()

    i.add_watch(DIRECTORY.encode())

    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        # Process the directory in a thread or locally. Not sure if it
        # is safe to submit to the executor from within one its workers.
        # Seems like it should be.
        executor.submit(do_directory, executor)
        # do_directory(executor)
        try:
            for event in i.event_gen():
                if event is not None:
                    (header, type_names, watch_path, filename) = event
                    # inotify event: IN_CLOSE_WRITE
                    if header.mask == 8:
                        future = executor.submit(process, filename.decode('utf-8'))
                        future.add_done_callback(future_callback)
                        print('Submitted inotify for', filename.decode())
        except KeyboardInterrupt:
            pass
        finally:
            i.remove_watch(DIRECTORY.encode())


if __name__ == '__main__':
    main()

测试:

从包含 10 个文件的目录开始。启动程序,等待 2 秒,然后创建 5 个新文件。查找“提交”消息以查看在仍在处理初始文件的同时收到事件并排队以及最终处理新文件。

~/p/TEST $ touch A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
~/p/TEST $ do_test() 
> rm B*
> ../inotify-test.py &
> sleep 2
> touch B1 B2 B3 B4 B5
> sleep 5
> pkill -f inotify-test.py
> 
~/p/TEST $ do_test
[1] 26663
future_callback: run: A10 (dir list)
future_callback: run: A4 (dir list)
future_callback: run: A5 (dir list)
future_callback: run: A9 (dir list)
future_callback: run: A2 (dir list)
Submitted inotify for B1
Submitted inotify for B2
Submitted inotify for B3
Submitted inotify for B4
Submitted inotify for B5
future_callback: run: A3 (dir list)
future_callback: run: A8 (dir list)
future_callback: run: A1 (dir list)
future_callback: run: A7 (dir list)
future_callback: run: A6 (dir list)
future_callback: run: B1 (inotify)
future_callback: run: B2 (inotify)
future_callback: run: B3 (inotify)
future_callback: run: B4 (inotify)
future_callback: run: B5 (inotify)
~/p/TEST $ 
[1]+  Terminated              ../inotify-test.py
~/p/TEST $ 

【讨论】:

谢谢,do_directory 在线程中运行时没有任何异常,所以不确定它是否确实安全。

以上是关于Python:inotify、concurrent.futures - 如何添加现有文件的主要内容,如果未能解决你的问题,请参考以下文章

python扭曲INotify而不阻塞反应器

Python Inotify 监视LINUX文件系统事件

Python Inotify 监视LINUX文件系统事件

在 python inotify 中未引发 IN_Q_OVERFLOW 事件

inotify+rsync实现实时同步(附解决crontab中无法执行python脚本的问题)

用于文件夹列表的 Python Inotify