解析 XML(RSS 提要)的 PHP 错误
Posted
技术标签:
【中文标题】解析 XML(RSS 提要)的 PHP 错误【英文标题】:PHP errors parsing XML (RSS feed) 【发布时间】:2012-05-03 20:02:33 【问题描述】:我正在使用基于the one found in this answer 的php 类来解析五个RSS 提要。五个工作中的四个没有任何麻烦,但one of them 给了我一些错误。是格式错误的 XML 还是其他问题?我无法控制 RSS 源的来源,但如果问题是他们的问题,希望通知所有者。
提前致谢。
PHP 错误:
Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 35: parser error : xmlParseEntityRef: no name in _rss.php on line 59
Warning: simplexml_load_string() [function.simplexml-load-string]: ne is June 30. The award will be presented at the 84th AHIMA Annual Convention & in _rss.php on line 59
Warning: simplexml_load_string() [function.simplexml-load-string]: ^ in _rss.php on line 59
Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 64: parser error : EntityRef: expecting ';' in _rss.php on line 59
Warning: simplexml_load_string() [function.simplexml-load-string]: e code modifications presented at the ICD-9-CM Coordination and Maintenance (C&M in _rss.php on line 59
Warning: simplexml_load_string() [function.simplexml-load-string]: ^ in _rss.php on line 59
XML / RSS Feed(在线http://ahima.org/RSS/News-Alerts-RSS.aspx):
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<generator>RSS Builder by AHIMA</generator>
<title>News And Alerts</title>
<link>http://www.ahima.org/</link>
<description>News and Alerts from AHIMA.ORG</description>
<language>en-us</language>
<managingEditor>webmaster@ahima.org</managingEditor>
<webMaster>webmaster@ahima.org</webMaster>
<copyright>2010 AHIMA</copyright>
<item>
<title>Exclusive Coverage of AHIMA ICD-10 Summit</title>
<pubDate>4/13/2012 2:39:54 PM</pubDate>
<link>http://journal.ahima.org/icdsummit/</link>
<author>webmaster@ahima.org</author>
<category>News - Alerts</category>
<description>The summit takes place April 16–17 in Baltimore, MD, and explores the challenges and opportunities involved in the transition to the ICD-10-CM/PCS coding systems. The Journal’s coverage begins April 11 with session previews and comments from the presenters. Keep up to date on the summit by checking this site daily, subscribing to the RRS feed, and following @JournalofAHIMA on Twitter. Follow the Twitter hash tag #ICD10Summit for updates from summit attendees.</description>
</item><item>
<title>AHIMA: Remain Focused on Expediting ICD-10 Implementation</title>
<pubDate>4/10/2012 2:18:22 PM</pubDate>
<link>http://www.ahima.org/downloads/pdfs/pr/press-releases/HHS%20Announces%20IDC-10%20Delay.pdf</link>
<author>webmaster@ahima.org</author>
<category>News - Alerts</category>
<description>CHICAGO – April 10, 2012 – In light of the U.S. Department of Health and Human Services (HHS) proposed one-year delay in implementing ICD-10-CM or ICD-10-PCS for HIPAA covered entities, AHIMA encouraged organizations to remain focused on their implementation efforts.
</description>
</item><item>
<title>Call for Nominations: New AHIMA Grace Award</title>
<pubDate>3/30/2012 11:45:09 AM</pubDate>
<link>/about/grace.aspx</link>
<author>webmaster@ahima.org</author>
<category>News - Alerts</category>
<description>Grace Award: In Recognition of Excellence in Health Information Management will honor healthcare delivery organizations that demonstrate effective and innovative approaches in using health information to deliver quality healthcare.
Nomination applications are now available, and the submission deadline is June 30. The award will be presented at the 84th AHIMA Annual Convention & Exhibit in Chicago, September 29-October 4.</description>
</item><item>
<title>Practice Brief: Mobile Device Security</title>
<pubDate>4/13/2012 2:44:19 PM</pubDate>
<link>http://library.ahima.org/xpedio/groups/public/documents/ahima/bok1_049463.hcsp?dDocName=bok1_049463</link>
<author>webmaster@ahima.org</author>
<category>News - Alerts</category>
<description>Mobile devices have pervaded the everyday work environment in healthcare. An organization may use mobile devices to improve clinician workflow, bedside information gathering and reporting, or a host of other care delivery applications. In some cases, individuals may use their own mobile devices to meet their personal workflow requirements.
Whatever purpose the device serves, healthcare organizations must be prepared to understand all the issues related to mobile device use.
This practice brief reviews the legal and regulatory requirements that affect mobile device use in healthcare. It also provides best practices for ensuring appropriate safeguards are in place to protect all electronic protected health information (ePHI) used and processed within mobile devices.</description>
</item><item>
<title>Workflow and EHRs in Small Medical Practices </title>
<pubDate>4/13/2012 2:45:16 PM</pubDate>
<link>http://perspectives.ahima.org/index.php?option=com_content&view=article&id=247:workflow-and-electronic-health-records-in-small-medical-practices&catid=42:electronic-records&Itemid=88</link>
<author>webmaster@ahima.org</author>
<category>News - Alerts</category>
<description>This paper analyzes the workflow and implementation of electronic health record (EHR) systems across different functions in small physician offices. We characterize the differences in the offices based on the levels of computerization in terms of workflow, sources of time delay, and barriers to using EHR systems to support the entire workflow.
The study was based on a combination of questionnaires, interviews, in situ observations, and data collection efforts. This study was not intended to be a full-scale time-and-motion study with precise measurements but was intended to provide an overview of the potential sources of delays while performing office tasks. The study follows an interpretive model of case studies rather than a large-sample statistical survey of practices. To identify time-consuming tasks, workflow maps were created based on the aggregated data from the offices. The results from the study show that specialty physicians are more favorable toward adopting EHR systems than primary care physicians are. The barriers to adoption of EHR systems by primary care physicians can be attributed to the complex workflows that exist in primary care physician offices, leading to nonstandardized workflow structures and practices. Also, primary care physicians would benefit more from EHR systems if the systems could interact with external entities.
</description>
</item><item>
<title>AHIMA Comments on Proposed Modification to ICD-9 Procedure Codes</title>
<pubDate>4/13/2012 2:46:58 PM</pubDate>
<link>http://www.ahima.org/downloads/pdfs/advocacy/AHIMA%20comments_CM_procedure_0312.pdf</link>
<author>webmaster@ahima.org</author>
<category>News - Alerts</category>
<description>The American Health Information Management Association (AHIMA) respectfully submits the following comments on the proposed procedure code modifications presented at the ICD-9-CM Coordination and Maintenance (C&M) Committee meeting held on March 5.</description>
</item><item>
<title>AHIMA Foundation Establishes Research Innovation and Leadership Institute</title>
<pubDate>4/13/2012 2:47:47 PM</pubDate>
<link>http://ahimafoundation.org/PolicyResearch/RILI.aspx</link>
<author>webmaster@ahima.org</author>
<category>News - Alerts</category>
<description>For the HIM profession to remain relevant and influential we must have a dynamic and expanding knowledge base and defined set of desired skills and expertise.
To remain relevant we need to expand our knowledge base and stakeout our content area of expertise through mission and discipline critical research. This research must meet standards of scientific rigor and set the foundation for knowledge creation, innovative concept development, and thought leadership.
To increase influence we need to disseminate knowledge through scholarly processes and publications that inform best practices and influence policy makers. Scholarship must demonstrate our unique expertise and content knowledge base within the healthcare industry. Furthermore, knowledge transfer or dissemination will increase AHIMA brand recognition and enhance brand prestige and prominence.
To sustain a systematic research initiative AHIMA has established a centralized, high performing Research Innovation and Leadership Institute (RILI) as an enduring mission critical component of the AHIMA Foundation.
</description>
</item>
</channel>
</rss>
PHP 代码:
<?php
if ( !function_exists( 'strip_html_tags' ) ) function strip_html_tags( $text )
$text = preg_replace(
array(
// Remove invisible content
'@<head[^>]*?>.*?</head>@siu',
'@<style[^>]*?>.*?</style>@siu',
'@<script[^>]*?.*?</script>@siu',
'@<object[^>]*?.*?</object>@siu',
'@<embed[^>]*?.*?</embed>@siu',
'@<applet[^>]*?.*?</applet>@siu',
'@<noframes[^>]*?.*?</noframes>@siu',
'@<noscript[^>]*?.*?</noscript>@siu',
'@<noembed[^>]*?.*?</noembed>@siu',
// Add line breaks before and after blocks
'@</?((address)|(blockquote)|(center)|(del))@iu',
'@</?((div)|(h[1-9])|(ins)|(isindex)|(p)|(pre))@iu',
'@</?((dir)|(dl)|(dt)|(dd)|(li)|(menu)|(ol)|(ul))@iu',
'@</?((table)|(th)|(td)|(caption))@iu',
'@</?((form)|(button)|(fieldset)|(legend)|(input))@iu',
'@</?((label)|(select)|(optgroup)|(option)|(textarea))@iu',
'@</?((frameset)|(frame)|(iframe))@iu',
),
array(
' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',
"\$0", "\$0", "\$0", "\$0", "\$0", "\$0",
"\$0", "\$0",
),
$text );
return strip_tags( $text );
class BlogPost
var $date;
var $ts;
var $link;
var $title;
var $text;
var $author;
var $summary;
var $full;
class BlogFeed
var $posts = array();
function BlogFeed($file_or_url)
if(!eregi('^http:', $file_or_url))
$feed_uri = $_SERVER['DOCUMENT_ROOT'] .'/shared/xml/'. $file_or_url;
else
$feed_uri = $file_or_url;
$xml_source = file_get_contents($feed_uri);
$x = simplexml_load_string($xml_source);
if (count($x) == 0) return;
foreach($x->channel->item as $item)
$post = new BlogPost();
$post->date = (string) $item->pubDate;
$post->ts = strtotime($item->pubDate);
$post->link = (string) $item->link;
$post->title = (string) $item->title;
$post->text = (string) strip_html_tags( $item->description );
$post->full = (string) $item->description;
$post->author = (string) $item->author;
$summary = strip_html_tags( $post->text );
$max_len = 300;
if(strlen($summary) > $max_len)
$summary = substr($summary, 0, $max_len) . '...';
$post->summary = $summary;
$this->posts[] = $post;
$blogs = array(
'http://www.hhs.gov/rss/news/hhsnews.xml',
'http://ahima.org/RSS/News-Alerts-RSS.aspx',
'http://www.healthcareitnews.com/rss/news',
'http://www.healthcareitnews.com/resource/feed',
'http://www.modernhealthcare.com/section/rss05&mime=xml'
);
foreach( $blogs as $k=>$v )
$blog = new BlogFeed($v);
foreach ( $blog->posts as $one_item )
/* ... */
【问题讨论】:
试试这个:simplehtmldom.sourceforge.net 它很旧,但效果很好。 XML 提要包含错误 - 这可能是 SimpleXML 无法解析它的原因。 【参考方案1】:正如其他答案和 cmets 中提到的,您的源 XML 已损坏,并且 XML 解析器应该拒绝无效输入。 libxml 有一个“恢复”模式,可以让你加载这个损坏的 XML,但你会丢失“&sid”部分,所以它没有帮助。
如果您很幸运并且喜欢冒险,您可以尝试通过某种方式修复输入以使其工作。您可以使用一些字符串替换来转义看起来像是在 URL 的查询部分中的 & 符号。
$xml = file_get_contents('broken.xml');
// replace & followed by a bunch of letters, numbers
// and underscores and an equal sign with &
$xml = preg_replace('#&(?=[a-z_0-9]+=)#', '&', $xml);
$sxe = simplexml_load_string($xml);
当然,这不过是一种 hack,解决您的情况的唯一好方法是要求您的 XML 提供商修复他们的生成器。因为如果它生成损坏的 XML,谁知道还有哪些其他错误会被忽视?
【讨论】:
【参考方案2】:嗯,与修复提要相比,它可能并不漂亮,但这是一个解决方案:
$xml_source = str_replace(array("&", "&"), array("&", "&"), file_get_contents($feed_uri));
$x = simplexml_load_string($xml_source);
首先,我将&amp;
替换为普通的&amp;
,以确保我再次将所有&amp;
转换回&amp;
。
【讨论】:
太棒了(y)!!!我猜又节省了几个小时,已经花了 3 个小时。非常感谢【参考方案3】:问题在于 XML - 特别是短语“第 84 届 AHIMA 年度会议和展览”中的“&”字符 - 这应该被转义。您可以通过任何在线 XML 验证器(例如 http://www.xmlvalidation.com/)来确定您正在处理的任何 XML 是否存在问题。
【讨论】:
+1 感谢您提供此资源,W3C 的验证器是我最好的朋友之一,所以我应该知道存在 XML 验证服务。以上是关于解析 XML(RSS 提要)的 PHP 错误的主要内容,如果未能解决你的问题,请参考以下文章
生成 PHP SimpleXML RSS 提要时出现 UTF8 错误