python 帮助在文件夹和子文件夹中查找重复文件

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python 帮助在文件夹和子文件夹中查找重复文件相关的知识,希望对你有一定的参考价值。

#!/usr/bin/python
import os
import sys


def getUniqueKey(fp):
    """
    Calculate unique as integer value of last date-time changed of file and his size
    :param fp:
    :return:
    """
    dt = os.path.getmtime(fp)
    sz = os.path.getsize(fp)
    return sz + int(dt)


def findDupcateFiles(folder_name):
    """
    return Dictionary of duplicated files in dir and sub dir
    :param folder_name:
    :return:
    """
    list_of_all_files = {}
    for dir, subDirs, files in os.walk(folder_name):
        for f in files:
            f_path = os.path.join(dir, f)
            k = getUniqueKey(f_path)
            if k in list_of_all_files:
                list_of_all_files[k].append(f_path)
            else:
                list_of_all_files[k] = [f_path]
    duplicates = list(filter(lambda x: len(x) > 1, list_of_all_files.values()))
    return duplicates


def printDupcates(duplicates):
    """
    Pretty print of result - finding of duplicated files
    :param duplicates:
    :return:
    """
    if len(duplicates) > 0:
        for dup in duplicates:
            print('Duplicates files: (by filse size and last date-time changes):', end="\n")
            print('-----------')
            for d in dup:
                print(d, end="\n")
            print('-----------')
    else:
        print("Duplicated files: not found.", end="\n")


if __name__ == '__main__':
    if len(sys.argv) > 1:
        dups = findDupcateFiles(sys.argv[1])
        printDupcates(dups)
    else:
        print("Usage: python " + __file__ + " folder_name", end="\n")

以上是关于python 帮助在文件夹和子文件夹中查找重复文件的主要内容,如果未能解决你的问题,请参考以下文章

递归搜索根目录和子文件夹中的文件[重复]

如何在多个文件夹和子文件夹中的文件上运行dos2unix命令? [重复]

查找当前目录和文件的目录[重复]

Bash mkdir和子文件夹[重复]

使用Google Apps脚本查找所有文件夹(和子文件夹)中的文档/幻灯片/表格

查找泡菜文件python的字节大小[重复]