过滤 os.walk() 目录和文件
Posted
技术标签:
【中文标题】过滤 os.walk() 目录和文件【英文标题】:Filtering os.walk() dirs and files 【发布时间】:2011-07-05 17:08:57 【问题描述】:我正在寻找一种在os.walk()
调用中包含/排除文件模式和排除目录的方法。
这是我现在正在做的事情:
import fnmatch
import os
includes = ['*.doc', '*.odt']
excludes = ['/home/paulo-freitas/Documents']
def _filter(paths):
for path in paths:
if os.path.isdir(path) and not path in excludes:
yield path
for pattern in (includes + excludes):
if not os.path.isdir(path) and fnmatch.fnmatch(path, pattern):
yield path
for root, dirs, files in os.walk('/home/paulo-freitas'):
dirs[:] = _filter(map(lambda d: os.path.join(root, d), dirs))
files[:] = _filter(map(lambda f: os.path.join(root, f), files))
for filename in files:
filename = os.path.join(root, filename)
print(filename)
有没有更好的方法来做到这一点?怎么样?
【问题讨论】:
【参考方案1】:为什么选择 fnmatch?
import os
excludes=....
for ROOT,DIR,FILES in os.walk("/path"):
for file in FILES:
if file.endswith(('doc','odt')):
print file
for directory in DIR:
if not directory in excludes :
print directory
没有经过全面测试
【讨论】:
结尾应该是 .doc 和 .odt 。因为在上面的代码中会返回一个名为 mydoc [没有文件扩展名] 的文件。另外,我认为这将满足 OP 发布的特定情况。排除可能也包含文件,而包含可能包含我猜的目录。 如果您必须使用 glob 模式,则需要fnmatch
(尽管问题中给出的示例不是这种情况)。
@Oben Sonne,glob (IMO) 比 fnmatch 有更多的“功能”。例如,路径名扩展。例如,您可以这样做glob.glob("/path/*/*/*.txt")
。
好点。对于简单的包含/排除模式,glob.glob()
可能是更好的解决方案。
出于良好实践和简化调试的目的,我尽量不使用与内置类型匹配的变量名,例如您使用的“文件”,因为它是内置类型。【参考方案2】:
这是一种方法
import fnmatch
import os
excludes = ['/home/paulo-freitas/Documents']
matches = []
for path, dirs, files in os.walk(os.getcwd()):
for eachpath in excludes:
if eachpath in path:
continue
else:
for result in [os.path.abspath(os.path.join(path, filename)) for
filename in files if fnmatch.fnmatch(filename,'*.doc') or fnmatch.fnmatch(filename,'*.odt')]:
matches.append(result)
print matches
【讨论】:
有一个错字:filename.odt
应该是 `filename, '*.odt'
如果包含模式的数量增加,则不切实际。此外,不允许对要排除的目录名称使用 glob 模式。
欧本,纠正错误。我同意包含模式部分。它可以在更通用的地方进行编码。
应该在“if eachpath in path”下继续是一个中断吗?【参考方案3】:
来自docs.python.org:
os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]])
当 topdown 为 True 时,调用者可以就地修改 dirnames 列表……这可用于修剪搜索……
for root, dirs, files in os.walk('/home/paulo-freitas', topdown=True):
# excludes can be done with fnmatch.filter and complementary set,
# but it's more annoying to read.
dirs[:] = [d for d in dirs if d not in excludes]
for pat in includes:
for f in fnmatch.filter(files, pat):
print os.path.join(root, f)
我应该指出,上面的代码假定excludes
是一个模式,而不是一个完整的路径。如果os.path.join(root, d) not in excludes
匹配 OP 案例,您需要调整列表理解来过滤。
【讨论】:
excludes
和 includes
在这里看起来像什么?这个答案有例子吗?【参考方案4】:
此解决方案使用 fnmatch.translate
将 glob 模式转换为正则表达式(假设包含仅用于文件):
import fnmatch
import os
import os.path
import re
includes = ['*.doc', '*.odt'] # for files only
excludes = ['/home/paulo-freitas/Documents'] # for dirs and files
# transform glob patterns to regular expressions
includes = r'|'.join([fnmatch.translate(x) for x in includes])
excludes = r'|'.join([fnmatch.translate(x) for x in excludes]) or r'$.'
for root, dirs, files in os.walk('/home/paulo-freitas'):
# exclude dirs
dirs[:] = [os.path.join(root, d) for d in dirs]
dirs[:] = [d for d in dirs if not re.match(excludes, d)]
# exclude/include files
files = [os.path.join(root, f) for f in files]
files = [f for f in files if not re.match(excludes, f)]
files = [f for f in files if re.match(includes, f)]
for fname in files:
print fname
【讨论】:
Ermm,我们需要if excludes
检查re.match(excludes, ...)
,不是吗?如果excludes = []
,它将匹配所有条目。但我喜欢你的方法,更清晰。 :)
@pf.me:你说得对,我没有考虑过这种情况。因此,要么您 1) 将排除列表理解包装在 if exclude
中,2) 前缀 not re.match(excludes, ...)
和 not exclude or
,或者 3) 如果原始排除项为空,则将excludes
设置为从不匹配的正则表达式。我使用变体 3 更新了我的答案。
经过一番谷歌搜索后,似乎 [:] 语法dirs[:] = [os.path.join(root, d) for d in dirs]
的要点是使用变异切片方法,该方法会更改列表,而不是创建新列表。这让我大吃一惊 - 没有 [:],它不起作用。
我还是没搞懂机制,dirs[:]怎么改变原来的列表?所有手册都说 slice[:] 返回列表的新副本,成员作为指向原始列表值的指针。Here is a discussion on Stack about this. 那么 dirs[:] 更改原始列表是如何发生的呢?
@Daniel:切片不仅可以用于获取列表的值,还可以用于分配选定的项目。由于[:]
表示完整列表,分配给该切片将替换列表的整个先前内容。见docs.python.org/2/library/stdtypes.html#mutable-sequence-types。【参考方案5】:
import os
includes = ['*.doc', '*.odt']
excludes = ['/home/paulo-freitas/Documents']
def file_search(path, exe):
for x,y,z in os.walk(path):
for a in z:
if a[-4:] == exe:
print os.path.join(x,a)
for x in includes:
file_search(excludes[0],x)
【讨论】:
【参考方案6】:dirtools 非常适合您的用例:
from dirtools import Dir
print(Dir('.', exclude_file='.gitignore').files())
【讨论】:
【参考方案7】:这是一个用os.walk()
排除目录和文件的例子:
ignoreDirPatterns=[".git"]
ignoreFilePatterns=[".php"]
def copyTree(src, dest, onerror=None):
src = os.path.abspath(src)
src_prefix = len(src) + len(os.path.sep)
for root, dirs, files in os.walk(src, onerror=onerror):
for pattern in ignoreDirPatterns:
if pattern in root:
break
else:
#If the above break didn't work, this part will be executed
for file in files:
for pattern in ignoreFilePatterns:
if pattern in file:
break
else:
#If the above break didn't work, this part will be executed
dirpath = os.path.join(dest, root[src_prefix:])
try:
os.makedirs(dirpath,exist_ok=True)
except OSError as e:
if onerror is not None:
onerror(e)
filepath=os.path.join(root,file)
shutil.copy(filepath,dirpath)
continue;#If the above else didn't executed, this will be reached
continue;#If the above else didn't executed, this will be reached
python >=3.2 由于exist_ok
in makedirs
【讨论】:
【参考方案8】:上述方法对我不起作用。
所以,这就是我对another question 的原始答案的扩展。
对我有用的是:
if (not (str(root) + '/').startswith(tuple(exclude_foldr)))
它编译了一个路径并排除了我列出的文件夹的元组。
这给了我想要的确切结果。
我的目标是让我的 mac 井井有条。
我可以通过path
、locate & move
特定的file.types
、ignore subfolders
搜索任何folder
,如果他们want to move
文件,我会抢先prompt the user
。
注意:
Prompt
每次运行只有一次,而不是每个文件
默认情况下,当您按 Enter 键而不是 [y/N] 时,提示默认为 NO
,并且只会列出要移动的 Potential
文件。
这只是一个snippet of my GitHub 完整的脚本请访问。
提示:阅读下面的脚本,因为我每行添加了关于我所做的事情的信息。
#!/usr/bin/env python3
# =============================================================================
# Created On : MAC OSX High Sierra 10.13.6 (17G65)
# Created On : Python 3.7.0
# Created By : Jeromie Kirchoff
# =============================================================================
"""THE MODULE HAS BEEN BUILD FOR KEEPING YOUR FILES ORGANIZED."""
# =============================================================================
from os import walk
from os import path
from shutil import move
import getpass
import click
mac_username = getpass.getuser()
includes_file_extensn = ([".jpg", ".gif", ".png", ".jpeg", ])
search_dir = path.dirname('/Users/' + mac_username + '/Documents/')
target_foldr = path.dirname('/Users/' + mac_username + '/Pictures/Archive/')
exclude_foldr = set([target_foldr,
path.dirname('/Users/' + mac_username +
'/Documents/GitHub/'),
path.dirname('/Users/' + mac_username +
'/Documents/Random/'),
path.dirname('/Users/' + mac_username +
'/Documents/Stupid_Folder/'),
])
if click.confirm("Would you like to move files?",
default=False):
question_moving = True
else:
question_moving = False
def organize_files():
"""THE MODULE HAS BEEN BUILD FOR KEEPING YOUR FILES ORGANIZED."""
# topdown=True required for filtering.
# "Root" had all info i needed to filter folders not dir...
for root, dir, files in walk(search_dir, topdown=True):
for file in files:
# creating a directory to str and excluding folders that start with
if (not (str(root) + '/').startswith(tuple(exclude_foldr))):
# showcase only the file types looking for
if (file.endswith(tuple(includes_file_extensn))):
# using path.normpath as i found an issue with double //
# in file paths.
filetomove = path.normpath(str(root) + '/' +
str(file))
# forward slash required for both to split
movingfileto = path.normpath(str(target_foldr) + '/' +
str(file))
# Answering "NO" this only prints the files "TO BE Moved"
print('Files To Move: ' + str(filetomove))
# This is using the prompt you answered at the beginning
if question_moving is True:
print('Moving File: ' + str(filetomove) +
"\n To:" + str(movingfileto))
# This is the command that moves the file
move(filetomove, movingfileto)
pass
# The rest is ignoring explicitly and continuing
else:
pass
pass
else:
pass
else:
pass
if __name__ == '__main__':
organize_files()
从终端运行我的脚本示例:
$ python3 organize_files.py
Exclude list: '/Users/jkirchoff/Pictures/Archive', '/Users/jkirchoff/Documents/Stupid_Folder', '/Users/jkirchoff/Documents/Random', '/Users/jkirchoff/Documents/GitHub'
Files found will be moved to this folder:/Users/jkirchoff/Pictures/Archive
Would you like to move files?
No? This will just list the files.
Yes? This will Move your files to the target folder.
[y/N]:
列表文件示例:
Files To Move: /Users/jkirchoff/Documents/Archive/JayWork/1.custom-award-768x512.jpg
Files To Move: /Users/jkirchoff/Documents/Archive/JayWork/10351458_318162838331056_9023492155204267542_n.jpg
...etc
移动文件示例:
Moving File: /Users/jkirchoff/Documents/Archive/JayWork/1.custom-award-768x512.jpg
To: /Users/jkirchoff/Pictures/Archive/1.custom-award-768x512.jpg
Moving File: /Users/jkirchoff/Documents/Archive/JayWork/10351458_318162838331056_9023492155204267542_n.jpg
To: /Users/jkirchoff/Pictures/Archive/10351458_318162838331056_9023492155204267542_n.jpg
...
【讨论】:
以上是关于过滤 os.walk() 目录和文件的主要内容,如果未能解决你的问题,请参考以下文章
Python os.walk() 遍历出当前目录下的文件夹和文件