软工博客归档工具(自用)

Posted @crowl/

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了软工博客归档工具(自用)相关的知识,希望对你有一定的参考价值。

#-*- codeing = utf-8 -*-
#@Time :2021/6/21 16:51
#@Author :Xxg
#@Site :
#@File :作业归档完善版.py
#@Software :PyCharm
import random
import requests
import pymysql
from lxml import etree
import docx
headers={
    "User-Agent": ""
}
url = \'\'

reponse = requests.get(url, headers=headers)   # reponse
html = etree.HTML(reponse.text)
# print(html)
date = html.xpath(\'//div[@class="dayTitle"]/a/text()\')
name = html.xpath(\'//div[@class="postTitle"]/a/span/text()\')
zhaiyao = html.xpath(\'//div[@class="postCon"]/div[@class="c_b_p_desc"]/text()\')
# 链接
yueduquanwen = html.xpath(\'//div[@class="postCon"]/div[@class="c_b_p_desc"]/a/@href\')
for i in range(len(yueduquanwen)):
    url1 = yueduquanwen[i]
    # url1 = "https://www.cnblogs.com/sakura-xxg/category/1990334.html"
    reponse1 = requests.get(url1, headers=headers)  # reponse
    html_son = etree.HTML(reponse1.text)
    title = html_son.xpath(\'//div[@class="post"]/h1[@class="postTitle"]/a/span/text()\')
    print(title)
    content = html_son.xpath(\'//div[@class=""]/p/text()\')
    print(content)
    date = html_son.xpath(\'//div[@class="postDesc"]/span[@id="post-date"]/text()\')
    print(date)
# 创建docx对象
    file = docx.Document()
    file.add_paragraph(date)
    for j in range(len(content)):
        file.add_paragraph(content[j])
    file.save("D:\\\\"+title[0]+".docx")
    # for j in range(len(content)):
    #   file.add_paragraphy(content[j])
    # date_son = html.xpath(\'//div[@class="dayTitle"]/a/text()\')
    # name_son = html.xpath(\'//div[@class="postTitle"]/a/span/text()\')
    # zhaiyao_son = html.xpath(\'//div[@class="postCon"]/div[@class="c_b_p_desc"]/text()\')
    # print(date_son)
    # print(zhaiyao_son)
print(yueduquanwen)
# print(date[0])
# print(name[0].replace(" ","").replace("\\n",""))
# print(zhaiyao[0].replace("\\n",""))
# print(zhaiyao[0])

# 保存成word
# for n in range(len(date)):
#     file = docx.Document()
#     file.add_paragraph(date[n])
#     file.add_paragraph(zhaiyao[2*n].replace("\\n",""))
#     # file.save("F:\\\\word\\\\"+name[n].replace(" ","").replace("\\n","")+".docx")
#     print(date[n])
#     print(zhaiyao[2*n])

 

以上是关于软工博客归档工具(自用)的主要内容,如果未能解决你的问题,请参考以下文章

软工第四次博客

分享自用博客园主题样式

[2017BUAA软工]第二次博客作业:代码复审

博客主题自用主题备份 (SimpleMemory DIY)

[2019BUAA软工助教]下半学期改进计划

2021秋软工实践第一次结对编程作业