lxml_解析错误ValueError

Posted 2021-03-08 zhoujun007

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了lxml_解析错误ValueError相关的知识，希望对你有一定的参考价值。

一：lxml解析错误

1.报错信息如下:

html=etree.HTML(xml)  --报错的代码行

ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

错误原因:

使用request.get请求响应的数据使用的是
r.text   返回的是str(unicode)
#响应内容的前几行如下：
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

原因分析:

本来是html数据的，但是被设置成了xml的，还设置了'UTF-8'编码

解决方案

requests.get请求响应，返回content bytes类型
#这个位置可能会有问题(去掉了decode())
return response.content

以上是关于lxml_解析错误ValueError的主要内容，如果未能解决你的问题，请参考以下文章