（python功能定制）复杂的xml文件对比，产生HTML展示区别

Posted 2020-10-23 IT自动化

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了（python功能定制）复杂的xml文件对比，产生HTML展示区别相关的知识，希望对你有一定的参考价值。

功能的设计初衷：
　　处理复杂的xml对比，屏蔽同节点先后顺序的影响

主要涉及知识点：

1、ElementTree库 ------- xml解析：

- 导入ElementTree，import xml.etree.ElementTree as ET
- 解析Xml文件找到根节点：
- 直接解析XML文件并获得根节点，tree = ET.parse(\'country_data.xml\') root = tree.getroot()
- 解析字符串，root = ET.fromstring(country_data_as_string)
- 遍历根节点可以获得子节点，然后就可以根据需求拿到需要的字段了，如：<APP_KEY channel = \'CSDN\'> hello123456789 </APP_KEY>

- - tag，即标签，用于标识该元素表示哪种数据，即APP_KEY
  - attrib，即属性，用Dictionary形式保存，即{\'channel\' = \'CSDN\'}
  - text，文本字符串，可以用来存储一些数据，即hello123456789
  - tail，尾字符串，并不是必须的，例子中没有包含。

2、difflib库 ------- 提供的类和方法用来进行序列的差异化比较，它能够比对文件并生成差异结果文本或者html格式的差异化比较页面

这里使用了类difflib.HtmlDiff，用来创建一个html表格展示文件差异，他既可以进行全文本展示，也可以只展示上下文不同。

其构造函数如下：

__init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK)

- - tabsize表示制表符代表的空格个数，默认为8
  - wrapcolumn,可选参数，用来设置多少个字符时自动换行，默认None,为None时表示不自动换行（重点：可以让html显示更美观）
  - linejunk 和 charjunk，可选参数，在ndiff()中使用，

公共方法（生成一个包含表格的html文件，其内容是用来展示差异）：

make_file(fromlines, tolines [, fromdesc][, todesc][, context][, numlines])

- - fromlines 和tolines,用于比较的内容，格式为字符串组成的列表
  - fromdesc 和 todesc，可选参数，对应的fromlines,tolines的差异化文件的标题，默认为空字符串
  - context 和 numlines，可选参数，context 为True时，只显示差异的上下文，为false，显示全文，numlines默认为5，当context为True时，控制展示上下文的行数，当context为false时,控制不同差异的高亮之间移动时“next”的开始位置（如果设置为0，当移动懂顶端时，超链接会丢失引用地址）

3、platform库 -------- 获取当前系统

4、logger库 -------- 如果使用robot framework，可以看到明显区别，可以定制日志log显示

robot framework的体验还不错，大概是因为其测试报告已经可以满足正常需要，很少有人会想去修改或者增加自己想要展示的内容，比如增加一个超链接，展示更多的内容，所以这部分花了很长时间均没有在网上找到相关资料，最后只能阅读源码。

遗憾与待优化：

　　其中有一部分内容，原先准备采用自循环的方式处理，但是过程中的数据传输逻辑容易错乱，以后会考虑把这部分优化一下。

##############################以下是代码部分，附件文件可以拖到本地执行并查看结果##################################################

  1 # coding=utf-8
  2 import re
  3 import xml.etree.ElementTree as ET  #解析xml的库
  4 import difflib   #文件对比库
  5 import datetime  #时间库
  6 import platform  #获取系统的库window、linux...
  7 import os
  8 from robot.api import logger    #不需要的话可以注释掉：robot framework框架脚本运行时会产生日志，可以利用这个库定制log
  9 
 10 # listafter：将解析后的xml，转换成按序排列的list：（tag，attrib，（tag，attrib，text））
 11 # 此方法是被下面一个方法xmltolist()调用的，想知道具体结果，可以使用下面的方法打印解析后的结果
 12 def listafter(listcom1):
 13     listcomarr1 = []
 14     text1 = []
 15     listcomarr1.append(listcom1.tag)
 16     listcomarr1.append(listcom1.attrib)
 17     if len(listcom1) > 0:
 18         for listcom2 in listcom1:
 19             listcomarr2 = []
 20             text2 = []
 21             listcomarr2.append(listcom2.tag)
 22             listcomarr2.append(listcom2.attrib)
 23             if len(listcom2) > 0:
 24                 for listcom3 in listcom2:
 25                     listcomarr3 = []
 26                     text3 = []
 27                     listcomarr3.append(listcom3.tag)
 28                     listcomarr3.append(listcom3.attrib)
 29                     if len(listcom3) > 0:
 30                         for listcom4 in listcom3:
 31                             listcomarr4 = []
 32                             text4 = []
 33                             listcomarr4.append(listcom4.tag)
 34                             listcomarr4.append(listcom4.attrib)
 35                             if len(listcom4) > 0:
 36                                 for listcom5 in listcom4:
 37                                     listcomarr5 = []
 38                                     text5 = []
 39                                     listcomarr5.append(listcom5.tag)
 40                                     listcomarr5.append(listcom5.attrib)
 41                                     if len(listcom5) > 0:
 42                                         for listcom6 in listcom5:
 43                                             listcomarr6 = []
 44                                             text6 = []
 45                                             listcomarr6.append(listcom6.tag)
 46                                             listcomarr6.append(listcom6.attrib)
 47                                             if len(listcom6) > 0:
 48                                                 for listcom7 in listcom6:
 49                                                     listcomarr7 = []
 50                                                     text7 = []
 51                                                     listcomarr7.append(listcom7.tag)
 52                                                     listcomarr7.append(listcom7.attrib)
 53                                                     if len(listcom7) > 0:
 54                                                         for listcom8 in listcom7:
 55                                                             listcomarr8 = []
 56                                                             text8 = []
 57                                                             listcomarr8.append(listcom8.tag)
 58                                                             listcomarr8.append(listcom8.attrib)
 59                                                             if len(listcom8) > 0:
 60                                                                 for listcom9 in listcom8:
 61                                                                     listcomarr9 = []
 62                                                                     text9 = []
 63                                                                     listcomarr9.append(listcom9.tag)
 64                                                                     listcomarr9.append(listcom9.attrib)
 65                                                                     # Start：判断是否需要继续递归
 66                                                                     if len(listcom9) > 0:
 67                                                                         for listcom10 in listcom9:
 68                                                                             listcomarr10 = []
 69                                                                             text10 = []
 70                                                                             listcomarr10.append(listcom10.tag)
 71                                                                             listcomarr10.append(listcom10.attrib)
 72                                                                             listcomarr10.append([listcom10.text])
 73                                                                             text9.append(listcomarr10)
 74                                                                     else:
 75                                                                         text9.append(listcom9.text)
 76                                                                     # End：判断是否需要继续递归
 77                                                                     # list二维数组排序
 78                                                                     text9 = sorted(text9)
 79                                                                     listcomarr9.append(text9)
 80                                                                     text8.append(listcomarr9)
 81                                                             else:
 82                                                                 text8.append(listcom8.text)
 83                                                             text8 = sorted(text8)
 84                                                             listcomarr8.append(text8)
 85                                                             text7.append(listcomarr8)
 86                                                     else:
 87                                                         text7.append(listcom7.text)
 88                                                     text7 = sorted(text7)
 89                                                     listcomarr7.append(text7)
 90                                                     text6.append(listcomarr7)
 91                                             else:
 92                                                 text6.append(listcom6.text)
 93                                             text6 = sorted(text6)
 94                                             listcomarr6.append(text6)
 95                                             text5.append(listcomarr6)
 96                                     else:
 97                                         text5.append(listcom5.text)
 98                                     text5 = sorted(text5)
 99                                     listcomarr5.append(text5)
100                                     text4.append(listcomarr5)
101                             else:
102                                 text4.append(listcom4.text)
103                             text4 = sorted(text4)
104                             listcomarr4.append(text4)
105                             text3.append(listcomarr4)
106                     else:
107                         text3.append(listcom3.text)
108                     text3 = sorted(text3)
109                     listcomarr3.append(text3)
110                     text2.append(listcomarr3)
111             else:
112                 text2.append(listcom2.text)
113             text2 = sorted(text2)
114             listcomarr2.append(text2)
115             text1.append(listcomarr2)
116     else:
117         text1.append(listcom1.text)
118     text1 = sorted(text1)
119     listcomarr1.append(text1)
120     return listcomarr1
121 
122 # 将xml内容转换成按序排列的list，返回值有3个:处理后的spmlxmllist、不需要处理的头部spmlstart、不需要处理的尾部spmlend
123 # spmlstart、spmlend是为了控制不需要处理的头部和尾部，提高处理效率
124 def xmltolist(spml):
125     if spml.find("<spml:") != -1:
126         startnum = re.search(r\'<spml:[^>]*>\', spml).span()[1]
127         endnum = spml.rfind("</spml:")
128         spmlstart = spml[:startnum].strip()
129         spmlend = spml[endnum:].strip()
130         spmlxml = \'\'\'<spml:modifyRequest xmlns:spml=\'{spml}\' xmlns:subscriber="{subscriber}" xmlns:xsi="{xsi}">\\n%s</spml:modifyRequest>\'\'\' % (
131             spml[startnum:endnum].strip())
132     elif spml.find("<PlexViewRequest") != -1:
133         startnum = re.search(r\'<PlexViewRequest[^>]*>\', spml).span()[1]
134         endnum = spml.rfind("</PlexViewRequest>")
135         spmlstart = spml[:startnum].strip()
136         spmlend = spml[endnum:].strip()
137         spmlxml = \'\'\'<PlexViewRequest>\\n%s</PlexViewRequest>\'\'\' % (spml[startnum:endnum].strip())
138     else:
139         spmlstart = ""
140         spmlend = ""
141         spmlxml = spml
142     # print spmlstart
143     # print endspml
144     # print spmlxml
145     tree = ET.fromstring(spmlxml)
146     spmlxmllist = listafter(tree)
147     return spmlxmllist, spmlstart, spmlend
148 
149 # 将xmltolist处理形成的spmlxmllist再回头变成xml（xml中，同节点的内容已被按需排列）
150 def listtoxml(spmllist1):
151     kong = "  "
152     spmltag1 = spmllist1[0]
153     spmlattrib1 = ""
154     bodyxml1 = ""
155     if spmllist1[1] != {}:
156         for key, value in spmllist1[1].items():
157             spmlattrib1 += " %s=\'%s\'" % (key, value)
158     startxml1 = "<%s%s>" % (spmltag1, spmlattrib1)
159     endxml1 = "</%s>" % (spmltag1)
160     spmlxml1 = ""
161     if isinstance(spmllist1[2][0], list):
162         spmlxml2 = ""
163         for spmllist2 in spmllist1[2]:
164             spmltag2 = spmllist2[0]
165             spmlattrib2 = ""
166             bodyxml2 = ""
167             if spmllist2[1] != {}:
168                 for key, value in spmllist2[1].items():
169                     spmlattrib2 += " %s=\'%s\'" % (key, value)
170             startxml2 = "<%s%s>" % (spmltag2, spmlattrib2)
171             endxml2 = "</%s>" % (spmltag2)
172             if isinstance(spmllist2[2][0], list):
173                 spmlxml3 = ""
174                 for spmllist3 in spmllist2[2]:
175                     spmltag3 = spmllist3[0]
176                     spmlattrib3 = ""
177                     bodyxml3 = ""
178                     if spmllist3[1] != {}:
179                         for key, value in spmllist3[1].items():
180                             spmlattrib3 += " %s=\'%s\'" % (key, value)
181                     startxml3 = "<%s%s>" % (spmltag3, spmlattrib3)
182                     endxml3 = "</%s>" % (spmltag3)
183                     if isinstance(spmllist3[2][0], list):
184                         spmlxml4 = ""
185                         for spmllist4 in spmllist3[2]:
186                             spmltag4 = spmllist4[0]
187                             spmlattrib4 = ""
188                             bodyxml4 = ""
189                             if spmllist4[1] != {}:
190                                 for key, value in spmllist4[1].items():
191                                     spmlattrib4 += " %s=\'%s\'" % (key, value)
192                             startxml4 = "<%s%s>" % (spmltag4, spmlattrib4)
193                             endxml4 = "</%s>" % (spmltag4)
194                             if isinstance(spmllist4[2][0], list):
195                                 spmlxml5 = ""
196                                 for spmllist5 in spmllist4[2]:
197                                     spmltag5 = spmllist5[0]
198                                     spmlattrib5 = ""
199                                     bodyxml5 = ""
200                                     if spmllist5[1] != {}:
201                                         for key, value in spmllist5[1].items():
202                                             spmlattrib5 += " %s=\'%s\'" % (key, value)
203                                     startxml5 = "<%s%s>" % (spmltag5, spmlattrib5)
204                                     endxml5 = "</%s>" % (spmltag5)
205                                     if isinstance(spmllist5[2][0], list):
206                                         spmlxml6 = ""
207                                         for spmllist6 in spmllist5[2]:
208                                             spmltag6 = spmllist6[0]
209                                             spmlattrib6 = ""
210                                             bodyxml6 = ""
211                                             if spmllist6[1] != {}:
212                                                 for key, value in spmllist6[1].items():
213                                                     spmlattrib6 += " %s=\'%s\'" % (key, value)
214                                             startxml6 = "<%s%s>" % (spmltag6, spmlattrib6)
215                                             endxml6 = "</%s>" % (spmltag6)
216                                             if isinstance(spmllist6[2][0], list):
217                                                 spmlxml7 = ""
218                                                 for spmllist7 in spmllist6[2]:
219                                                     spmltag7 = spmllist7[0]
220                                                     spmlattrib7 = ""
221                                                     bodyxml7 = ""
222                                                     if spmllist7[1] != {}:
223                                                         for key, value in spmllist7[1].items():
224                                                             spmlattrib7 += " %s=\'%s\'" % (key, value)
225                                                     startxml7 = "<%s%s>" % (spmltag7, spmlattrib7)
226                                                     endxml7 = "</%s>" % (spmltag7)
227                                                     if isinstance(spmllist7[2][0], list):
228                                                         spmlxml8 = ""
229                                                         for spmllist8 in spmllist7[2]:
230                                                             spmltag8 = spmllist8[0]
231                                                             spmlattrib8 = ""
232                                                             bodyxml8 = ""
233                                                             if spmllist8[1] != {}:
234                                                                 for key, value in spmllist8[1].items():
235                                                                     spmlattrib8 += " %s=\'%s\'" % (key, value)
236                                                             startxml8 = "<%s%s>" % (spmltag8, spmlattrib8)
237                                                             endxml8 = "</%s>" % (spmltag8)
238                                                             if isinstance(spmllist8[2][0], list):
239                                                                 spmlxml9 = ""
240                                                                 for spmllist9 in spmllist8[2]:
241                                                                     spmltag9 = spmllist9[0]
242                                                                     spmlattrib9 = ""
243                                                                     bodyxml9 = ""
244                                                                     if spmllist9[1] != {}:
245                                                                         for key, value in spmllist9[1].items():
246                                                                             spmlattrib9 += " %s=\'%s\'" % (key, value)
247                                                                     startxml9 = "<%s%s>" % (spmltag9, spmlattrib9)
248                                                                     endxml9 = "</%s>" % (spmltag9)
249                                                                     if isinstance(spmllist9[2][0], list):
250                                                                         spmlxml10 = ""
251                                                                         for spmllist10 in spmllist9[2]:
252                                                                             spmltag10 = spmllist10[0]
253                                                                             spmlattrib10 = ""
254                                                                             bodyxml10 = ""
255                                                                             if spmllist10[1] != {}:
256                                                                                 for key, value in spmllist10[1].items():
257                                                                                     spmlattrib10 += " %s=\'%s\'" % (
258                                                                                         key, value)
259                                                                             startxml10 = "<%s%s>" % (
260                                                                                 spmltag10, spmlattrib10)
261                                                                             endxml10 = "</%s>" % (spmltag10)
262                                                                             bodyxml10 = spmllist10[2][0]
263                                                                             spmlxml10 += "\\n%s%s%s%s" % (
264                                                                                 kong * 9, startxml10, bodyxml10,
265                                                                                 endxml10)
266                                                                         spmlxml9 += "\\n%s%s%s\\n%s%s" % (
267                                                                             kong * 8, startxml9, spmlxml10, kong * 8,
268                                                                             endxml9)
269                                                                     else:
270                                                                         bodyxml9 = spmllist9[2][0]
271                                                                         spmlxml9 += "\\n%s%s%s%s" % (
272                                                                             kong * 8, startxml9, bodyxml9, endxml9)
273                                                                 spmlxml8 += "\\n%s%s%s\\n%s%s" % (
274                                                                     kong * 7, startxml8, spmlxml9, kong * 7, endxml8)
275                                                             else:
276                                                                 bodyxml8 = spmllist8[2][0]
277                                                                 spmlxml8 += "\\n%s%s%s%s" % (
278                                                                     kong * 7, startxml8, bodyxml8, endxml8)
279                                                         spmlxml7 += "\\n%s%s%s\\n%s%s" % (
280                                                             kong * 6, startxml7, spmlxml8, kong * 6, endxml7)
281                                                     else:
以上是关于（python功能定制）复杂的xml文件对比，产生HTML展示区别的主要内容，如果未能解决你的问题，请参考以下文章