(python功能定制)复杂的xml文件对比,产生HTML展示区别
Posted IT自动化
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了(python功能定制)复杂的xml文件对比,产生HTML展示区别相关的知识,希望对你有一定的参考价值。
功能的设计初衷:
处理复杂的xml对比,屏蔽同节点先后顺序的影响
主要涉及知识点:
1、ElementTree库 ------- xml解析:
-
- 导入ElementTree,
import xml.etree.ElementTree as ET
- 解析Xml文件找到根节点:
- 直接解析XML文件并获得根节点,
tree = ET.parse(\'country_data.xml\') root = tree.getroot()
- 解析字符串,
root = ET.fromstring(country_data_as_string)
- 遍历根节点可以获得子节点,然后就可以根据需求拿到需要的字段了,如:<APP_KEY channel = \'CSDN\'> hello123456789 </APP_KEY>
- 导入ElementTree,
-
-
- tag,即标签,用于标识该元素表示哪种数据,即APP_KEY
- attrib,即属性,用Dictionary形式保存,即{\'channel\' = \'CSDN\'}
- text,文本字符串,可以用来存储一些数据,即hello123456789
- tail,尾字符串,并不是必须的,例子中没有包含。
-
2、difflib库 ------- 提供的类和方法用来进行序列的差异化比较,它能够比对文件并生成差异结果文本或者html格式的差异化比较页面
这里使用了类difflib.HtmlDiff,用来创建一个html表格展示文件差异,他既可以进行全文本展示,也可以只展示上下文不同。
其构造函数如下:
__init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK)
-
-
- tabsize表示制表符代表的空格个数,默认为8
- wrapcolumn,可选参数,用来设置多少个字符时自动换行,默认None,为None时表示不自动换行(重点:可以让html显示更美观)
- linejunk 和 charjunk,可选参数,在ndiff()中使用,
-
公共方法(生成一个包含表格的html文件,其内容是用来展示差异):
make_file(fromlines, tolines [, fromdesc][, todesc][, context][, numlines])
-
-
- fromlines 和tolines,用于比较的内容,格式为字符串组成的列表
- fromdesc 和 todesc,可选参数,对应的fromlines,tolines的差异化文件的标题,默认为空字符串
- context 和 numlines,可选参数,context 为True时,只显示差异的上下文,为false,显示全文,numlines默认为5,当context为True时,控制展示上下文的行数,当context为false时,控制不同差异的高亮之间移动时“next”的开始位置(如果设置为0,当移动懂顶端时,超链接会丢失引用地址)
-
3、platform库 -------- 获取当前系统
4、logger库 -------- 如果使用robot framework,可以看到明显区别,可以定制日志log显示
robot framework的体验还不错,大概是因为其测试报告已经可以满足正常需要,很少有人会想去修改或者增加自己想要展示的内容,比如增加一个超链接,展示更多的内容,所以这部分花了很长时间均没有在网上找到相关资料,最后只能阅读源码。
遗憾与待优化:
其中有一部分内容,原先准备采用自循环的方式处理,但是过程中的数据传输逻辑容易错乱,以后会考虑把这部分优化一下。
##############################以下是代码部分,附件文件可以拖到本地执行并查看结果##################################################
1 # coding=utf-8 2 import re 3 import xml.etree.ElementTree as ET #解析xml的库 4 import difflib #文件对比库 5 import datetime #时间库 6 import platform #获取系统的库window、linux... 7 import os 8 from robot.api import logger #不需要的话可以注释掉:robot framework框架脚本运行时会产生日志,可以利用这个库定制log 9 10 # listafter:将解析后的xml,转换成按序排列的list:(tag,attrib,(tag,attrib,text)) 11 # 此方法是被下面一个方法xmltolist()调用的,想知道具体结果,可以使用下面的方法打印解析后的结果 12 def listafter(listcom1): 13 listcomarr1 = [] 14 text1 = [] 15 listcomarr1.append(listcom1.tag) 16 listcomarr1.append(listcom1.attrib) 17 if len(listcom1) > 0: 18 for listcom2 in listcom1: 19 listcomarr2 = [] 20 text2 = [] 21 listcomarr2.append(listcom2.tag) 22 listcomarr2.append(listcom2.attrib) 23 if len(listcom2) > 0: 24 for listcom3 in listcom2: 25 listcomarr3 = [] 26 text3 = [] 27 listcomarr3.append(listcom3.tag) 28 listcomarr3.append(listcom3.attrib) 29 if len(listcom3) > 0: 30 for listcom4 in listcom3: 31 listcomarr4 = [] 32 text4 = [] 33 listcomarr4.append(listcom4.tag) 34 listcomarr4.append(listcom4.attrib) 35 if len(listcom4) > 0: 36 for listcom5 in listcom4: 37 listcomarr5 = [] 38 text5 = [] 39 listcomarr5.append(listcom5.tag) 40 listcomarr5.append(listcom5.attrib) 41 if len(listcom5) > 0: 42 for listcom6 in listcom5: 43 listcomarr6 = [] 44 text6 = [] 45 listcomarr6.append(listcom6.tag) 46 listcomarr6.append(listcom6.attrib) 47 if len(listcom6) > 0: 48 for listcom7 in listcom6: 49 listcomarr7 = [] 50 text7 = [] 51 listcomarr7.append(listcom7.tag) 52 listcomarr7.append(listcom7.attrib) 53 if len(listcom7) > 0: 54 for listcom8 in listcom7: 55 listcomarr8 = [] 56 text8 = [] 57 listcomarr8.append(listcom8.tag) 58 listcomarr8.append(listcom8.attrib) 59 if len(listcom8) > 0: 60 for listcom9 in listcom8: 61 listcomarr9 = [] 62 text9 = [] 63 listcomarr9.append(listcom9.tag) 64 listcomarr9.append(listcom9.attrib) 65 # Start:判断是否需要继续递归 66 if len(listcom9) > 0: 67 for listcom10 in listcom9: 68 listcomarr10 = [] 69 text10 = [] 70 listcomarr10.append(listcom10.tag) 71 listcomarr10.append(listcom10.attrib) 72 listcomarr10.append([listcom10.text]) 73 text9.append(listcomarr10) 74 else: 75 text9.append(listcom9.text) 76 # End:判断是否需要继续递归 77 # list二维数组排序 78 text9 = sorted(text9) 79 listcomarr9.append(text9) 80 text8.append(listcomarr9) 81 else: 82 text8.append(listcom8.text) 83 text8 = sorted(text8) 84 listcomarr8.append(text8) 85 text7.append(listcomarr8) 86 else: 87 text7.append(listcom7.text) 88 text7 = sorted(text7) 89 listcomarr7.append(text7) 90 text6.append(listcomarr7) 91 else: 92 text6.append(listcom6.text) 93 text6 = sorted(text6) 94 listcomarr6.append(text6) 95 text5.append(listcomarr6) 96 else: 97 text5.append(listcom5.text) 98 text5 = sorted(text5) 99 listcomarr5.append(text5) 100 text4.append(listcomarr5) 101 else: 102 text4.append(listcom4.text) 103 text4 = sorted(text4) 104 listcomarr4.append(text4) 105 text3.append(listcomarr4) 106 else: 107 text3.append(listcom3.text) 108 text3 = sorted(text3) 109 listcomarr3.append(text3) 110 text2.append(listcomarr3) 111 else: 112 text2.append(listcom2.text) 113 text2 = sorted(text2) 114 listcomarr2.append(text2) 115 text1.append(listcomarr2) 116 else: 117 text1.append(listcom1.text) 118 text1 = sorted(text1) 119 listcomarr1.append(text1) 120 return listcomarr1 121 122 # 将xml内容转换成按序排列的list,返回值有3个:处理后的spmlxmllist、不需要处理的头部spmlstart、不需要处理的尾部spmlend 123 # spmlstart、spmlend是为了控制不需要处理的头部和尾部,提高处理效率 124 def xmltolist(spml): 125 if spml.find("<spml:") != -1: 126 startnum = re.search(r\'<spml:[^>]*>\', spml).span()[1] 127 endnum = spml.rfind("</spml:") 128 spmlstart = spml[:startnum].strip() 129 spmlend = spml[endnum:].strip() 130 spmlxml = \'\'\'<spml:modifyRequest xmlns:spml=\'{spml}\' xmlns:subscriber="{subscriber}" xmlns:xsi="{xsi}">\\n%s</spml:modifyRequest>\'\'\' % ( 131 spml[startnum:endnum].strip()) 132 elif spml.find("<PlexViewRequest") != -1: 133 startnum = re.search(r\'<PlexViewRequest[^>]*>\', spml).span()[1] 134 endnum = spml.rfind("</PlexViewRequest>") 135 spmlstart = spml[:startnum].strip() 136 spmlend = spml[endnum:].strip() 137 spmlxml = \'\'\'<PlexViewRequest>\\n%s</PlexViewRequest>\'\'\' % (spml[startnum:endnum].strip()) 138 else: 139 spmlstart = "" 140 spmlend = "" 141 spmlxml = spml 142 # print spmlstart 143 # print endspml 144 # print spmlxml 145 tree = ET.fromstring(spmlxml) 146 spmlxmllist = listafter(tree) 147 return spmlxmllist, spmlstart, spmlend 148 149 # 将xmltolist处理形成的spmlxmllist再回头变成xml(xml中,同节点的内容已被按需排列) 150 def listtoxml(spmllist1): 151 kong = " " 152 spmltag1 = spmllist1[0] 153 spmlattrib1 = "" 154 bodyxml1 = "" 155 if spmllist1[1] != {}: 156 for key, value in spmllist1[1].items(): 157 spmlattrib1 += " %s=\'%s\'" % (key, value) 158 startxml1 = "<%s%s>" % (spmltag1, spmlattrib1) 159 endxml1 = "</%s>" % (spmltag1) 160 spmlxml1 = "" 161 if isinstance(spmllist1[2][0], list): 162 spmlxml2 = "" 163 for spmllist2 in spmllist1[2]: 164 spmltag2 = spmllist2[0] 165 spmlattrib2 = "" 166 bodyxml2 = "" 167 if spmllist2[1] != {}: 168 for key, value in spmllist2[1].items(): 169 spmlattrib2 += " %s=\'%s\'" % (key, value) 170 startxml2 = "<%s%s>" % (spmltag2, spmlattrib2) 171 endxml2 = "</%s>" % (spmltag2) 172 if isinstance(spmllist2[2][0], list): 173 spmlxml3 = "" 174 for spmllist3 in spmllist2[2]: 175 spmltag3 = spmllist3[0] 176 spmlattrib3 = "" 177 bodyxml3 = "" 178 if spmllist3[1] != {}: 179 for key, value in spmllist3[1].items(): 180 spmlattrib3 += " %s=\'%s\'" % (key, value) 181 startxml3 = "<%s%s>" % (spmltag3, spmlattrib3) 182 endxml3 = "</%s>" % (spmltag3) 183 if isinstance(spmllist3[2][0], list): 184 spmlxml4 = "" 185 for spmllist4 in spmllist3[2]: 186 spmltag4 = spmllist4[0] 187 spmlattrib4 = "" 188 bodyxml4 = "" 189 if spmllist4[1] != {}: 190 for key, value in spmllist4[1].items(): 191 spmlattrib4 += " %s=\'%s\'" % (key, value) 192 startxml4 = "<%s%s>" % (spmltag4, spmlattrib4) 193 endxml4 = "</%s>" % (spmltag4) 194 if isinstance(spmllist4[2][0], list): 195 spmlxml5 = "" 196 for spmllist5 in spmllist4[2]: 197 spmltag5 = spmllist5[0] 198 spmlattrib5 = "" 199 bodyxml5 = "" 200 if spmllist5[1] != {}: 201 for key, value in spmllist5[1].items(): 202 spmlattrib5 += " %s=\'%s\'" % (key, value) 203 startxml5 = "<%s%s>" % (spmltag5, spmlattrib5) 204 endxml5 = "</%s>" % (spmltag5) 205 if isinstance(spmllist5[2][0], list): 206 spmlxml6 = "" 207 for spmllist6 in spmllist5[2]: 208 spmltag6 = spmllist6[0] 209 spmlattrib6 = "" 210 bodyxml6 = "" 211 if spmllist6[1] != {}: 212 for key, value in spmllist6[1].items(): 213 spmlattrib6 += " %s=\'%s\'" % (key, value) 214 startxml6 = "<%s%s>" % (spmltag6, spmlattrib6) 215 endxml6 = "</%s>" % (spmltag6) 216 if isinstance(spmllist6[2][0], list): 217 spmlxml7 = "" 218 for spmllist7 in spmllist6[2]: 219 spmltag7 = spmllist7[0] 220 spmlattrib7 = "" 221 bodyxml7 = "" 222 if spmllist7[1] != {}: 223 for key, value in spmllist7[1].items(): 224 spmlattrib7 += " %s=\'%s\'" % (key, value) 225 startxml7 = "<%s%s>" % (spmltag7, spmlattrib7) 226 endxml7 = "</%s>" % (spmltag7) 227 if isinstance(spmllist7[2][0], list): 228 spmlxml8 = "" 229 for spmllist8 in spmllist7[2]: 230 spmltag8 = spmllist8[0] 231 spmlattrib8 = "" 232 bodyxml8 = "" 233 if spmllist8[1] != {}: 234 for key, value in spmllist8[1].items(): 235 spmlattrib8 += " %s=\'%s\'" % (key, value) 236 startxml8 = "<%s%s>" % (spmltag8, spmlattrib8) 237 endxml8 = "</%s>" % (spmltag8) 238 if isinstance(spmllist8[2][0], list): 239 spmlxml9 = "" 240 for spmllist9 in spmllist8[2]: 241 spmltag9 = spmllist9[0] 242 spmlattrib9 = "" 243 bodyxml9 = "" 244 if spmllist9[1] != {}: 245 for key, value in spmllist9[1].items(): 246 spmlattrib9 += " %s=\'%s\'" % (key, value) 247 startxml9 = "<%s%s>" % (spmltag9, spmlattrib9) 248 endxml9 = "</%s>" % (spmltag9) 249 if isinstance(spmllist9[2][0], list): 250 spmlxml10 = "" 251 for spmllist10 in spmllist9[2]: 252 spmltag10 = spmllist10[0] 253 spmlattrib10 = "" 254 bodyxml10 = "" 255 if spmllist10[1] != {}: 256 for key, value in spmllist10[1].items(): 257 spmlattrib10 += " %s=\'%s\'" % ( 258 key, value) 259 startxml10 = "<%s%s>" % ( 260 spmltag10, spmlattrib10) 261 endxml10 = "</%s>" % (spmltag10) 262 bodyxml10 = spmllist10[2][0] 263 spmlxml10 += "\\n%s%s%s%s" % ( 264 kong * 9, startxml10, bodyxml10, 265 endxml10) 266 spmlxml9 += "\\n%s%s%s\\n%s%s" % ( 267 kong * 8, startxml9, spmlxml10, kong * 8, 268 endxml9) 269 else: 270 bodyxml9 = spmllist9[2][0] 271 spmlxml9 += "\\n%s%s%s%s" % ( 272 kong * 8, startxml9, bodyxml9, endxml9) 273 spmlxml8 += "\\n%s%s%s\\n%s%s" % ( 274 kong * 7, startxml8, spmlxml9, kong * 7, endxml8) 275 else: 276 bodyxml8 = spmllist8[2][0] 277 spmlxml8 += "\\n%s%s%s%s" % ( 278 kong * 7, startxml8, bodyxml8, endxml8) 279 spmlxml7 += "\\n%s%s%s\\n%s%s" % ( 280 kong * 6, startxml7, spmlxml8, kong * 6, endxml7) 281 else:以上是关于(python功能定制)复杂的xml文件对比,产生HTML展示区别的主要内容,如果未能解决你的问题,请参考以下文章 Java Debug 笔记:定制 Jackson 解析器来完成对复杂格式 XML 的解析