如何使用python代码，从当前文件夹一个文件里复制字符到另一个文件夹下的同名文件里，文件有多个！

Posted 2023-05-03

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了如何使用python代码，从当前文件夹一个文件里复制字符到另一个文件夹下的同名文件里，文件有多个！相关的知识，希望对你有一定的参考价值。

用python向一个文件夹里遍历所有文本文件，并从中提取出指定内容，写到另一个文件夹下的文件里，要求文件名相同，因为文件较多不想一个一个手动写文件名，我还如何操作，请各位老师不吝赐教

import os

# 参数设置
# 自行定义源文件地址和目标地址
_TARGET_DIR = "./copied_files/"
_SOURCE_DIR = "./source_files/"

# 你自己定义的提取特定信息的方法：
def extract(filename):
    # 提取信息后返回
    with open(filename, "r") as f:
    info = f.readlines()
    return info

# 使用os.listdir()方法获取源文件夹中所有文件
# 有时系统内会有些隐藏文件以"."开头，需要剔除
files = [file for file in os.listdir(_SOURCE_DIR) if not file.startswith(".")]

for filename in files:
    # 1.读取文件并提取信息：
    print("正在处理...".format(filename))
    info = extract(_SOURCE_DIR + filename)
    # 2.在目标文件夹创建同名文件，并将信息写入
    # 写入部分或需要根据需要调整
    with open(_TARGET_DIR + filename, "w") as f:
        for line in info:
        f.write(line)

print("处理完成！")

参考技术A import os
import re
reg=re.compile("指定内容正则表达式")
source="一个文件夹路径"
target="另一个文件夹路径"
for filename in os.listdir(source):
    fullname=os.path.join(source,filename)
    targetfile=os.path.join(target,filename)
    if os.path.isfile(fullname) and os.path.splitext(filename)[1].lower()==".txt":
        text=reg.search(open(fullname).read()).group(0)
        open(targetfile,'w').write(text)

参考技术B import os,re
src_path = ''#src_path指源文件夹路径
des_path = ''#目标文件夹路径

pattern = re.compile(r'')#指定内容正则表达式

for root,paths,files in os.walk(src_path):
for file in files:#遍历源文件

with open(os.path.join(root, file), 'r') as f1:#打开源文件

with open(os.path.join(des_path, file), 'w+') as f2:#新建并打开新文件，文件名相同
f2.writer(pattern.match(f1.read()).group(0))#正则提取内容写入新文件，ps根据需要修改正则参考技术C #! /usr/bin/env python
# coding=utf-8
import os
import shutil
import time
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
def copy_and_rename(fpath_input, fpath_output):
    for file in os.listdir(fpath_input):
        oldname = os.path.join(fpath_input, file)
        newname_1 = os.path.join(fpath_output,
                                 os.path.splitext(file)[0] + "_1.jpg")
        newname_2 = os.path.join(fpath_output,
                                 os.path.splitext(file)[0] + "_2.jpg")
        newname_3 = os.path.join(fpath_output,
                                 os.path.splitext(file)[0] + "_3.jpg")
        shutil.copyfile(oldname, newname_1)
        shutil.copyfile(oldname, newname_2)
        shutil.copyfile(oldname, newname_3)
if __name__ == '__main__':
    print('start ...')
    t1 = time.time() * 1000
    #time.sleep(1) #1s
    fpath_input = "D:/123/0708/"
    fpath_output = "D:/345/0708/"
    copy_and_rename(fpath_input, fpath_output)
    t2 = time.time() * 1000
    print('take time:' + str(t2 - t1) + 'ms')
    print('end.')

参考技术D os.walk()了解下，这个可以实现遍历。
总正则把符合的抓出来，然后在写入另外一个文件夹即可。

如何使用 Python 从需要登录信息的网站下载文件？

【中文标题】如何使用 Python 从需要登录信息的网站下载文件？【英文标题】：How to download file from website that requires login information using Python? 【发布时间】：2014-05-13 05:13:24 【问题描述】：

我正在尝试使用 Python 从网站下载一些数据。如果您只是复制并粘贴 url，除非您填写登录信息，否则它不会显示任何内容。我有登录名和密码，但是我应该如何将它们包含在 Python 中？

我当前的代码是：

import urllib, urllib2, cookielib

username = my_user_name
password = my_pwd

link = 'www.google.com' # just for instance
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode('username' : username, 'j_password' : password)

opener.open(link, login_data)
resp = opener.open(link,login_data)
print resp.read()

没有弹出错误，但是 resp.read() 是一堆 CSS，它只有“你必须登录才能在这里阅读新闻”之类的消息。

那么如何找回登录后的页面呢？

刚刚注意到该网站需要 3 个条目：

Company: 

Username: 

Password:

我拥有所有这些，但是如何将所有三个都放入登录变量中？

如果我在没有登录的情况下运行它，它会返回：

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

opener.open(dd)
resp = opener.open(dd)

print resp.read()

这里是打印输出：

<DIV id=header>
<DIV id=strapline><!-- login_display -->
<P><FONT color=#000000>All third party users of this website and/or data produced by the Baltic do so at their own risk. The Baltic owes no duty of care or any other obligation to any party other than the contractual obligations which it owes to its direct contractual partners. </FONT></P><IMG src="images/top-strap.gif"> <!-- template [strapline]--></DIV><!-- end strapline -->
<DIV id=memberNav>
<FORM class=members id=form1 name=form1 action=client_login/client_authorise.asp?action=login method=post onsubmits="return check()">

【问题讨论】：

它不起作用，打印 resp.read() 仍然返回“

只能通过订阅访问此数据。点击这里 a> 免费试用。

" @André 我注意到该页面需要 3 个登录项，我拥有所有这些项，但我不确定如何将其放入 login_info？我已对其进行了编辑，但不确定这是否是您要求的。我没有在 print resp.read() 结果中找到 【参考方案1】：

使用 scrapy 来抓取该数据，Scrapy

然后你就可以这样做了

class LoginSpider(Spider):
    name = 'example.com'
    start_urls = ['http://www.example.com/users/login.php']

    def parse(self, response):
        return [FormRequest.from_response(response,
                    formdata='username': 'john', 'password': 'secret',
                    callback=self.after_login)]

    def after_login(self, response):
        # check login succeed before going on
        if "authentication failed" in response.body:
            self.log("Login failed", level=log.ERROR)
            return

【讨论】：

这可能行得通，但我认为他不需要这么大的库来完成像登录这样的琐碎任务......同样可以在 Python-Requests 甚至 urllib 的两行之一中完成. 我现在没有 scrapy，我必须让 IT 为我安装它，因为 Python 在服务器上..【参考方案2】：

这段代码应该可以工作，使用Python-Requests - 只需将... 替换为实际域，当然还有登录数据。

from requests import Session

s = Session() # this session will hold the cookies

# here we first login and get our session cookie
s.post("http://.../client_login/client_authorise.asp?action=login", "companyName":"some_company", "password":"some_password", "username":"some_user", "status":"")

# now we're logged in and can request any page
resp = s.get("http://.../").text

print(resp)

【讨论】：

谢谢，但在 resp 变量中我仍然拥有 ">

只能通过订阅访问这些数据。点击此处免费试用.

".....我确定登录名是正确的

以上是关于如何使用python代码，从当前文件夹一个文件里复制字符到另一个文件夹下的同名文件里，文件有多个！的主要内容，如果未能解决你的问题，请参考以下文章