过滤 os.walk() 目录和文件

Posted

技术标签:

【中文标题】过滤 os.walk() 目录和文件【英文标题】:Filtering os.walk() dirs and files 【发布时间】:2011-07-05 17:08:57 【问题描述】:

我正在寻找一种在os.walk() 调用中包含/排除文件模式和排除目录的方法。

这是我现在正在做的事情:

import fnmatch
import os

includes = ['*.doc', '*.odt']
excludes = ['/home/paulo-freitas/Documents']

def _filter(paths):
    for path in paths:
        if os.path.isdir(path) and not path in excludes:
            yield path

        for pattern in (includes + excludes):
            if not os.path.isdir(path) and fnmatch.fnmatch(path, pattern):
                yield path

for root, dirs, files in os.walk('/home/paulo-freitas'):
    dirs[:] = _filter(map(lambda d: os.path.join(root, d), dirs))
    files[:] = _filter(map(lambda f: os.path.join(root, f), files))

    for filename in files:
        filename = os.path.join(root, filename)

        print(filename)

有没有更好的方法来做到这一点?怎么样?

【问题讨论】:

【参考方案1】:

为什么选择 fnmatch?

import os
excludes=....
for ROOT,DIR,FILES in os.walk("/path"):
    for file in FILES:
       if file.endswith(('doc','odt')):
          print file
    for directory in DIR:
       if not directory in excludes :
          print directory

没有经过全面测试

【讨论】:

结尾应该是 .doc 和 .odt 。因为在上面的代码中会返回一个名为 mydoc [没有文件扩展名] 的文件。另外,我认为这将满足 OP 发布的特定情况。排除可能也包含文件,而包含可能包含我猜的目录。 如果您必须使用 glob 模式,则需要 fnmatch(尽管问题中给出的示例不是这种情况)。 @Oben Sonne,glob (IMO) 比 fnmatch 有更多的“功能”。例如,路径名扩展。例如,您可以这样做glob.glob("/path/*/*/*.txt") 好点。对于简单的包含/排除模式,glob.glob() 可能是更好的解决方案。 出于良好实践和简化调试的目的,我尽量不使用与内置类型匹配的变量名,例如您使用的“文件”,因为它是内置类型。【参考方案2】:

这是一种方法

import fnmatch
import os

excludes = ['/home/paulo-freitas/Documents']
matches = []
for path, dirs, files in os.walk(os.getcwd()):
    for eachpath in excludes:
        if eachpath in path:
            continue
    else:
        for result in [os.path.abspath(os.path.join(path, filename)) for
                filename in files if fnmatch.fnmatch(filename,'*.doc') or fnmatch.fnmatch(filename,'*.odt')]:
            matches.append(result)
print matches

【讨论】:

有一个错字:filename.odt 应该是 `filename, '*.odt' 如果包含模式的数量增加,则不切实际。此外,不允许对要排除的目录名称使用 glob 模式。 欧本,纠正错误。我同意包含模式部分。它可以在更通用的地方进行编码。 应该在“if eachpath in path”下继续是一个中断吗?【参考方案3】:

来自docs.python.org:

os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]])

当 topdown 为 True 时,调用者可以就地修改 dirnames 列表……这可用于修剪搜索……

for root, dirs, files in os.walk('/home/paulo-freitas', topdown=True):
    # excludes can be done with fnmatch.filter and complementary set,
    # but it's more annoying to read.
    dirs[:] = [d for d in dirs if d not in excludes] 
    for pat in includes:
        for f in fnmatch.filter(files, pat):
            print os.path.join(root, f)

我应该指出,上面的代码假定excludes 是一个模式,而不是一个完整的路径。如果os.path.join(root, d) not in excludes 匹配 OP 案例,您需要调整列表理解来过滤。

【讨论】:

excludesincludes 在这里看起来像什么?这个答案有例子吗?【参考方案4】:

此解决方案使用 fnmatch.translate 将 glob 模式转换为正则表达式(假设包含仅用于文件):

import fnmatch
import os
import os.path
import re

includes = ['*.doc', '*.odt'] # for files only
excludes = ['/home/paulo-freitas/Documents'] # for dirs and files

# transform glob patterns to regular expressions
includes = r'|'.join([fnmatch.translate(x) for x in includes])
excludes = r'|'.join([fnmatch.translate(x) for x in excludes]) or r'$.'

for root, dirs, files in os.walk('/home/paulo-freitas'):

    # exclude dirs
    dirs[:] = [os.path.join(root, d) for d in dirs]
    dirs[:] = [d for d in dirs if not re.match(excludes, d)]

    # exclude/include files
    files = [os.path.join(root, f) for f in files]
    files = [f for f in files if not re.match(excludes, f)]
    files = [f for f in files if re.match(includes, f)]

    for fname in files:
        print fname

【讨论】:

Ermm,我们需要if excludes 检查re.match(excludes, ...),不是吗?如果excludes = [],它将匹配所有条目。但我喜欢你的方法,更清晰。 :) @pf.me:你说得对,我没有考虑过这种情况。因此,要么您 1) 将排除列表理解包装在 if exclude 中,2) 前缀 not re.match(excludes, ...)not exclude or,或者 3) 如果原始排除项为空,则将excludes 设置为从不匹配的正则表达式。我使用变体 3 更新了我的答案。 经过一番谷歌搜索后,似乎 [:] 语法dirs[:] = [os.path.join(root, d) for d in dirs] 的要点是使用变异切片方法,该方法会更改列表,而不是创建新列表。这让我大吃一惊 - 没有 [:],它不起作用。 我还是没搞懂机制,dirs[:]怎么改变原来的列表?所有手册都说 slice[:] 返回列表的新副本,成员作为指向原始列表值的指针。Here is a discussion on Stack about this. 那么 dirs[:] 更改原始列表是如何发生的呢? @Daniel:切片不仅可以用于获取列表的值,还可以用于分配选定的项目。由于[:] 表示完整列表,分配给该切片将替换列表的整个先前内容。见docs.python.org/2/library/stdtypes.html#mutable-sequence-types。【参考方案5】:
import os
includes = ['*.doc', '*.odt']
excludes = ['/home/paulo-freitas/Documents']
def file_search(path, exe):
for x,y,z in os.walk(path):
    for a in z:
        if a[-4:] == exe:
            print os.path.join(x,a)
        for x in includes:
            file_search(excludes[0],x)

【讨论】:

【参考方案6】:

dirtools 非常适合您的用例:

from dirtools import Dir

print(Dir('.', exclude_file='.gitignore').files())

【讨论】:

【参考方案7】:

这是一个用os.walk()排除目录和文件的例子:

ignoreDirPatterns=[".git"]
ignoreFilePatterns=[".php"]
def copyTree(src, dest, onerror=None):
    src = os.path.abspath(src)
    src_prefix = len(src) + len(os.path.sep)
    for root, dirs, files in os.walk(src, onerror=onerror):
        for pattern in ignoreDirPatterns:
            if pattern in root:
                break
        else:
            #If the above break didn't work, this part will be executed
            for file in files:
                for pattern in ignoreFilePatterns:
                    if pattern in file:
                        break
                else:
                    #If the above break didn't work, this part will be executed
                    dirpath = os.path.join(dest, root[src_prefix:])
                    try:
                        os.makedirs(dirpath,exist_ok=True)
                    except OSError as e:
                        if onerror is not None:
                            onerror(e)
                    filepath=os.path.join(root,file)
                    shutil.copy(filepath,dirpath)
                continue;#If the above else didn't executed, this will be reached

        continue;#If the above else didn't executed, this will be reached

python >=3.2 由于exist_ok in makedirs

【讨论】:

【参考方案8】:

上述方法对我不起作用。

所以,这就是我对another question 的原始答案的扩展。

对我有用的是:

if (not (str(root) + '/').startswith(tuple(exclude_foldr)))

它编译了一个路径并排除了我列出的文件夹的元组。

这给了我想要的确切结果。

我的目标是让我的 mac 井井有条。

我可以通过pathlocate & move 特定的file.typesignore subfolders 搜索任何folder,如果他们want to move 文件,我会抢先prompt the user

注意:Prompt 每次运行只有一次,而不是每个文件

默认情况下,当您按 Enter 键而不是 [y/N] 时,提示默认为 NO,并且只会列出要移动的 Potential 文件。

这只是一个snippet of my GitHub 完整的脚本请访问。

提示:阅读下面的脚本,因为我每行添加了关于我所做的事情的信息。

#!/usr/bin/env python3
# =============================================================================
# Created On  : MAC OSX High Sierra 10.13.6 (17G65)
# Created On  : Python 3.7.0
# Created By  : Jeromie Kirchoff
# =============================================================================
"""THE MODULE HAS BEEN BUILD FOR KEEPING YOUR FILES ORGANIZED."""
# =============================================================================
from os import walk
from os import path
from shutil import move
import getpass
import click

mac_username = getpass.getuser()
includes_file_extensn = ([".jpg", ".gif", ".png", ".jpeg", ])
search_dir = path.dirname('/Users/' + mac_username + '/Documents/')
target_foldr = path.dirname('/Users/' + mac_username + '/Pictures/Archive/')
exclude_foldr = set([target_foldr,
                    path.dirname('/Users/' + mac_username +
                                 '/Documents/GitHub/'),
                     path.dirname('/Users/' + mac_username +
                                  '/Documents/Random/'),
                     path.dirname('/Users/' + mac_username +
                                  '/Documents/Stupid_Folder/'),
                     ])

if click.confirm("Would you like to move files?",
                 default=False):
    question_moving = True
else:
    question_moving = False


def organize_files():
    """THE MODULE HAS BEEN BUILD FOR KEEPING YOUR FILES ORGANIZED."""
    # topdown=True required for filtering.
    # "Root" had all info i needed to filter folders not dir...
    for root, dir, files in walk(search_dir, topdown=True):
        for file in files:
            # creating a directory to str and excluding folders that start with
            if (not (str(root) + '/').startswith(tuple(exclude_foldr))):
                # showcase only the file types looking for
                if (file.endswith(tuple(includes_file_extensn))):
                    # using path.normpath as i found an issue with double //
                    # in file paths.
                    filetomove = path.normpath(str(root) + '/' +
                                               str(file))
                    # forward slash required for both to split
                    movingfileto = path.normpath(str(target_foldr) + '/' +
                                                 str(file))
                    # Answering "NO" this only prints the files "TO BE Moved"
                    print('Files To Move: ' + str(filetomove))
                    # This is using the prompt you answered at the beginning
                    if question_moving is True:
                        print('Moving File: ' + str(filetomove) +
                              "\n To:" + str(movingfileto))
                        # This is the command that moves the file
                        move(filetomove, movingfileto)
                        pass

            # The rest is ignoring explicitly and continuing
                    else:
                        pass
                    pass
                else:
                    pass
            else:
                pass


if __name__ == '__main__':
    organize_files()

从终端运行我的脚本示例:

$ python3 organize_files.py
Exclude list: '/Users/jkirchoff/Pictures/Archive', '/Users/jkirchoff/Documents/Stupid_Folder', '/Users/jkirchoff/Documents/Random', '/Users/jkirchoff/Documents/GitHub'
Files found will be moved to this folder:/Users/jkirchoff/Pictures/Archive
Would you like to move files?
No? This will just list the files.
Yes? This will Move your files to the target folder.
[y/N]: 

列表文件示例:

Files To Move: /Users/jkirchoff/Documents/Archive/JayWork/1.custom-award-768x512.jpg
Files To Move: /Users/jkirchoff/Documents/Archive/JayWork/10351458_318162838331056_9023492155204267542_n.jpg
...etc

移动文件示例:

Moving File: /Users/jkirchoff/Documents/Archive/JayWork/1.custom-award-768x512.jpg
To: /Users/jkirchoff/Pictures/Archive/1.custom-award-768x512.jpg
Moving File: /Users/jkirchoff/Documents/Archive/JayWork/10351458_318162838331056_9023492155204267542_n.jpg
To: /Users/jkirchoff/Pictures/Archive/10351458_318162838331056_9023492155204267542_n.jpg
...

【讨论】:

以上是关于过滤 os.walk() 目录和文件的主要内容,如果未能解决你的问题,请参考以下文章

Python os.walk() 方法遍历文件目录

Python os.walk() 遍历出当前目录下的文件夹和文件

关于搜索全部文件和修改文件名的方法os.walk() 和os.listdir

Python3 os.walk() 方法

python笔记4-遍历文件夹目录os.walk()

python os.walk()