python 中如何实现对文件的复制、粘贴

Posted 2023-03-25

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了python 中如何实现对文件的复制、粘贴相关的知识，希望对你有一定的参考价值。

比如说：我想将"c:\123\1.txt"复制到"d:\新建文件夹"下。还有就是如果D盘没有“新建文件夹”，我能不能自动创建一个新建文件夹，并将1.txt复制到"d:\新建文件夹"下？急啊，各位大侠帮帮忙啊！！！！！

file类中没有提供专门的文件复制函数，因此只能通过使用文件的读写函数来实现文件的复制。这里仅仅给出范例：
src = file("myfile.txt", "w+")
temp = ["hello world! \n"]
src.writelines(temp)
src.close()

src = file("myfile.txt", "r+")
des = file("myfile2.txt", "w+")
des.writelines(src.read())
src.close()
des.close()

shutil模块是另一个文件，目录的管理接口，提供了一些用于复制文件，目录的函数。copyfile（）函数可以实现文件的拷贝，声明如下：
copyfile（src， des）
文件的剪切可以使用move（）函数模拟，声明如下：
move（src，des）
功能：移动一个文件或者目录到指定的位置，并且可以根据参数des重命名移动后的文件。参考技术A 用shutil模块
#!/usr/bin/env python
#-*- coding: utf-8 -*-

import os
import os.path
from shutil import copy

dest_dir = ur'd:\新建文件夹'
if not os.path.isdir(dest_dir):
os.makedirs(dest_dir)

file_path = ur'c:\123\1.txt'
copy(file_path, dest_dir)本回答被提问者和网友采纳参考技术B 参考shutil模块

告别「复制+粘贴」，Python 实现PDF转文本

对很多人来说，将 PDF 转换为可编辑的文本是个刚需，却苦于没有简单方法。在本文介绍的项目中，尝试使用 OCR（光学字符识别）自动转录 pdf 幻灯片，转录效果还不错。

基本过程可分为以下步骤：

将 pdf 转换为图片；
检测和识别图像中的文本；
展示示例输出。

基于深度学习的 OCR 将 pdf 转录为文本

将 pdf 转换为图像

Soares 使用的 pdf 幻灯片来自于 David Silver 的增强学习（参见以下 pdf 幻灯片地址）。使用「pdf2image」包将每张幻灯片转换为 png 图像格式。

pdf 幻灯片示例。地址：https://www.davidsilver.uk/wp-content/uploads/2020/03/intro_RL.pdf

代码如下：

from pdf2image import convert_from_path
from pdf2image.exceptions import (
 PDFInfoNotInstalledError,
 PDFPageCountError,
 PDFSyntaxError
)

pdf_path = "path/to/file/intro_RL_Lecture1.pdf"
images = convert_from_path(pdf_path)
for i, image in enumerate(images):
    fname = "image" + str(i) + ".png"
    image.save(fname, "PNG")

经过处理后，所有的 pdf 幻灯片都转换成 png 格式的图像：

检测和识别图像中的文本

为了检测和识别 png 图像中的文本，Soares 使用 ocr.pytorch 库中的文本检测器。按照说明下载模型并将模型保存在 checkpoints 文件夹中。

ocr.pytorch 库地址：https://github.com/courao/ocr.pytorch

代码如下：

adapted from this source: https://github.com/courao/ocr.pytorch
%load_ext autoreload
%autoreload 2
import os
from ocr import ocr
import time
import shutil
import numpy as np
import pathlib
from PIL import Image
from glob import glob
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
import pytesseract

def single_pic_proc(image_file):
    image = np.array(Image.open(image_file).convert('RGB'))
    result, image_framed = ocr(image)
    return result,image_framed

image_files = glob('./input_images/*.*')
result_dir = './output_images_with_boxes/'

# If the output folder exists we will remove it and redo it.
if os.path.exists(result_dir):
    shutil.rmtree(result_dir)
os.mkdir(result_dir)

for image_file in sorted(image_files):
    result, image_framed = single_pic_proc(image_file) # detecting and recognizing the text
    filename = pathlib.Path(image_file).name
    output_file = os.path.join(result_dir, image_file.split('/')[-1])
    txt_file = os.path.join(result_dir, image_file.split('/')[-1].split('.')[0]+'.txt')
    txt_f = open(txt_file, 'w')
    Image.fromarray(image_framed).save(output_file)
    for key in result:
        txt_f.write(result[key][1]+'\\n')
    txt_f.close()

设置输入和输出文件夹，接着遍历所有输入图像（转换后的 pdf 幻灯片），然后通过 single_pic_proc() 函数运行 OCR 模块中的检测和识别模型，最后将输出保存到输出文件夹。

其中检测继承（inherit）了 Pytorch CTPN 模型，识别继承了 Pytorch CRNN 模型，两者都存在于 OCR 模块中。

示例输出

代码如下：

import cv2 as cv

output_dir = pathlib.Path("./output_images_with_boxes")

# image = cv.imread(str(np.random.choice(list(output_dir.iterdir()),1)[0]))
image = cv.imread(f"output_dir/image7.png")
size_reshaped = (int(image.shape[1]),int(image.shape[0]))
image = cv.resize(image, size_reshaped)
cv.imshow("image", image)
cv.waitKey(0)
cv.destroyAllWindows()

下图左为原始 pdf 幻灯片，图右为转录后的输出文本，转录后的准确率非常高。

文本识别输出如下：

filename = f"output_dir/image7.txt"
with open(filename, "r") as text:
    for line in text.readlines():
        print(line.strip("\\n"))

通过上述方法，最终你可以得到一个非常强大的工具来转录各种文档，从检测和识别手写笔记到检测和识别照片中的随机文本。拥有自己的 OCR 工具来处理一些文本内容，这比依赖外部软件来转录文档要好的多。

想要了解更多Python自动化学习教程可扫码免费领取

记得VX扫码找任姐姐免费领取哦！

以上是关于python 中如何实现对文件的复制、粘贴的主要内容，如果未能解决你的问题，请参考以下文章