PHP - 如何处理“utf-16”、us-ascii 编码的 html 字符串以正确保存在 DomDocument 中?

Posted

技术标签:

【中文标题】PHP - 如何处理“utf-16”、us-ascii 编码的 html 字符串以正确保存在 DomDocument 中?【英文标题】:PHP - How to handle 'utf-16', us-ascii encoded html string to save correctly in DomDocument? 【发布时间】:2019-05-01 04:04:53 【问题描述】:

我正在开发一个 php 项目,该项目可以获取电子邮件并将其显示在屏幕上。在一封电子邮件中,它会获取以下 html

    <html>
    <head>

    <META http-equiv="Content-Type" content="text/html; charset=utf-16">

    <style type="text/css">
          TD 
          font-family: Verdana,Tahoma,Arial, "Sans Serif";
          font-size: 10pt;
          
          BODY 
          font-family: Verdana,Tahoma,Arial, "Sans Serif";
          font-size: 10pt;
          
        </style>



    </head>

      <body bgcolor="#eeeeee"><img    src="https://trademe.tmcdn.co.nz/images/1pixel.gif?gen=20181128"><table cellspacing="0" cellpadding="0"  bgcolor="white" align="center" style="border-left: 1px #CCCCCC solid; border-right: 1px #CCCCCC solid; border-top: 1px #CCCCCC solid;">
      <tr>

        <td  colspan="4">&nbsp;</td>

      </tr>

      <tr>

        <td ></td>

        <td><a href="https://www.trademe.co.nz/Track.aspx?site=2018112820201&amp;tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;"><img border="0"    src="https://trademe.tmcdn.co.nz/images/new-brand-2016/common/tm-logo-2016-246x48-v1.gif?gen=2018112820201"></a><img src="https://api.trademe.co.nz/tracking/collect?evt=open&amp;tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937&amp;tid=EB71C99D-BEB4-445F-B62B-C172AC5A4CF4"></td>

        <td align="center"></td>

        <td ></td>

      </tr>

      <tr>

        <td ></td>

        <td colspan="2">

          <hr size="0" color="#CCCCCC">

          <center><small>Security Note: Trade Me will never ask you for your password via email</small></center>

          <hr size="0" color="#CCCCCC">

        </td>

        <td ></td>

      </tr>

      <tr>

        <td ></td>

        <td colspan="2" style="padding-left: 10px; padding-top: 10px;"><small>

      This is an automated email regarding listing #: 1847238571</small><br><br>

    Hi Matthew,

    <br><br><div>

      A member has asked a question on your listing for "2.4KW 2400W 3KVA 24VDC Pure Sine Wave Power Inverter Solar Caravan Off Grid LCD".

    </div><br><table  cellpadding="3" cellspacing="0" border="0">

            <tr>

              <td align="center" ><img    src="https://trademe.tmcdn.co.nz/images/icon_question.gif">&nbsp;</td>

              <td>what is the warranty like? &nbsp;&nbsp;<small><i>posted by:&nbsp;</i></small>&nbsp;<b><a href="https://www.trademe.co.nz/Members/Listings.aspx?member=4187691&amp;tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;">matihegarty</a></b>

    (<a href="https://www.trademe.co.nz/Members/Feedback.aspx?member=4187691&amp;tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;">5</a>&nbsp;<a href="https://www.trademe.co.nz/Members/Feedback.aspx?member=4187691&amp;tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937"><img align="absmiddle" border="0" src="https://www.trademe.co.nz/images/star.gif"></a>)

  &nbsp;&nbsp;&nbsp;<small>8:54 pm, Wed 28 Nov</small></td>

            </tr>

          </table><br><br><center><b><font size="3"><a href="https://www.trademe.co.nz/a.asp?id=1847238571&amp;qna=true#qna&amp;tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;">Answer this question</a></font></b></center><br><br><div>

      We recommend you answer all questions on your listings to help buyers make informed decisions. Questions on vehicle listings created in Trade Me Motors will be displayed automatically. For other listings, questions will only be displayed if answered.

    </div><br><br>

    Happy trading!

    <br><br>

    The Trade Me team

    <br><a href="https://www.trademe.co.nz/?tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;">www.trademe.co.nz</a><br><br><small>

      If you don't wish to receive these emails or prefer plain text email, please update your

      <a href="https://www.trademe.co.nz/MyTradeMe/EmailOptions.aspx?tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;">email options</a></small></td>

        <td ></td>

      </tr>

      <tr>

        <td colspan="3">

          <table cellspacing="0" cellpadding="0" border="0"  align="center" style="background-color:White;">

            <tr>

              <td align="center"><br><small><img   src="https://trademe.tmcdn.co.nz/images/3/common/triangle.gif">&nbsp;<font color="#666666">advertisement</font></small><br><br></td>

            </tr>

          </table>

          <table cellspacing="0" cellpadding="0" border="0"  align="center" style="background-color:#9A9A9A;">

            <tr>

              <td><a href="https://www.trademe.co.nz/Link.aspx?i=101247"><img style="border-width:0;" src="https://trademe.tmcdn.co.nz/photoserver/adserver/TMI0003-700x70-mates-FA.png?e="   ></a></td>

            </tr>

          </table>

        </td>

      </tr>

    </table>

  </body>

</html>

我的程序是这样做的:

    $cleanMessage = new DOMDocument();
    @$cleanMessage->loadHTML($this->bodyHTML); //To clean the html code for unclosed td table tags and other 

    $this->message = $cleanMessage->saveHTML();

但我的输出是:

��http://www.w3.org/TR/REC-html40/loose.dtd"> TD font-family: Verdana,Tahoma,Arial, "Sans Serif"; 字体大小:10pt; BODY font-family: Verdana,Tahoma,Arial, "Sans Serif"; 字体大小:10pt; � 安全提示:Trade Me 永远不会 通过电子邮件询问您的密码 这是一封关于 列表编号:1847238571 嗨,马修, 一位会员在您的列表中询问了“2.4KW 2400W 3KVA”的问题 24VDC 纯正弦波功率逆变器太阳能大篷车离网 LCD"。

� 保修是什么样的? ��已发布 作者:��matihegarty (5�) ���8:54 11 月 28 日星期三下午 回答这个 问题我们推荐你 回答有关您的列表的所有问题,以帮助买家了解情况 决定。关于 Trade Me Motors 中创建的车辆列表的问题 会自动显示。对于其他列表,问题将 仅在回答时显示。 【参考方案1】:

如果您看到奇怪的字符,请在您的 html 中将 charsetutf-16 替换为 utf-8ISO-8859-1

$this->bodyHTML = str_replace("charset=utf-16","charset=utf-8", $this->bodyHTML);

【讨论】:

以上是关于PHP - 如何处理“utf-16”、us-ascii 编码的 html 字符串以正确保存在 DomDocument 中?的主要内容,如果未能解决你的问题,请参考以下文章

我不知道如何处理这个 PHP 代码

php 8 中的 xpath 如何处理?

Apache,如何处理未知的 php 文件,如标准 URL?

如何处理 PHP 代码中的 HTTP/1.1 403 Forbidden

PHP XML RPC - 如何处理返回的数组

Angularjs如何处理来自php(json)的二维数组