python 读写excel

Posted 2020-10-16 亡灵族

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了python 读写excel相关的知识，希望对你有一定的参考价值。

最近老大让从网站上获取数据，手动太慢，网上找了点python，用脚本操作。

 1 import os
 2 import re
 3 
 4 import xlrd
 5 import requests
 6 import xlwt
 7 from bs4 import BeautifulSoup
 8 from xlutils.copy import copy
 9 from xlwt import *
10 
11 
12 def read_excel(path):
13     # 打开文件
14     workbook = xlrd.open_workbook(path)
15     # 获取所有sheet
16 
17     # 根据sheet索引或者名称获取sheet内容
18     sheet1 = workbook.sheet_by_index(0)  # sheet索引从0开始
19 
20     # sheet的名称，行数，列数
21     i = 0
22     for sheet1_values in sheet1._cell_values:
23 
24         str = sheet1_values[0]
25         str.replace(‘\‘‘,‘‘)
26         print (str,i)
27         response = get_responseHtml(str)
28         soup = get_beautifulSoup(response)
29         pattern1 = ‘^https://ews-aln-core.cisco.com/applmgmt/view-appl/+[0-9]*$‘
30         pattern2 = ‘^https://ews-aln-core.cisco.com/applmgmt/view-endpoint/+[0-9]*$‘
31         pattern3 = ‘^https://ews-aln-core.cisco.com/applmgmt/view-appl/by-name/‘
32         if pattern_match(str,pattern1) or pattern_match(str,pattern3):
33             priority = soup.find("table", class_="main_table_layout").find("tr", class_="centered sub_section_header").find_next("tr",
34                                                                                                                   align="center").find_all(
35             "td")
36         elif pattern_match(str,pattern2):
37             priority = soup.find("table", class_="main_table_layout").find("tr",
38                                                                            class_="centered").find_next(
39                 "tr",
40                 align="center").find_all(
41                 "td")
42         else:
43             print("no pattern")
44         try:
45             priorityNumble =‘P‘ + get_last_td(priority)
46 
47         except Exception:
48             print("没有找到"+str)
49             priorityNumble = ‘P‘ + get_last_td(priority)
50         write_excel(path,i,1,priorityNumble)
51         i = i + 1
52 def write_excel(path,row,col,value):
53     oldwb = xlrd.open_workbook(path)
54     wb =copy(oldwb)
55     ws = wb.get_sheet(0)
56     ws.write(row,col,value)
57     wb.save(path)
58 def get_last_td(result):
59     for idx  in range(len(result)):
60         returnResult = result[idx].contents[0]
61     return returnResult
62 def get_beautifulSoup(request):
63     soup = BeautifulSoup(request, ‘html.parser‘, from_encoding=‘utf-8‘, exclude_encodings=‘utf-8‘)
64     return soup
65 def get_responseHtml(url):
66     headers = {
67         ‘User-Agent‘: ‘User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36‘}
68     response = requests.get(url, auth=(userName, passWord),headers=headers).content
69     return response
70 def pattern_match(str,pattern,flags = 0):
71     pattern = re.compile(pattern)
72     return re.match(pattern,str,flags)
73 
74 if __name__ == ‘__main__‘:
75     userName = ‘*‘;
76     passWord = ‘*‘
77     path = r‘*‘
78     read_excel(path)

这里面坑可是不少

　　1.刚开始xlsx格式文件，save后不能打开，把excel格式改为xls才正确。

　　2.header网上找的，这样不会被认为是网络爬虫而报错：http.client.RemoteDisconnected: Remote end closed connection without response.

　　3.copy的参数要为workbook而不是xls的fileName，否则报错：AttributeError: ‘str’ object has no attribute ‘datemode’.

　　4.找到一篇很好的博客：Python中，添加写入数据到已经存在的Excel的xls文件，即打开excel文件，写入新数据

　　5.刚开始想往新的文件里save，用了新的路径，发现不可行，因为在for循环中每次都是从源excel中copy，所以实际结果只插入了一行。

　　6.正则表达式的语法：正则表达式 - 语法和 Python正则表达式

　　6.python中beautiful soup的用法，很全的文档：Beautiful Soup 4.2.0 文档

　　7.一个爬小说的demo：Python3网络爬虫(七)：使用Beautiful Soup爬取小说

　　8.从没写过python，第一次写，花了半天时间，还有很多可以改进的地方。

以上是关于python 读写excel的主要内容，如果未能解决你的问题，请参考以下文章