使用xml2解析xml

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了使用xml2解析xml相关的知识,希望对你有一定的参考价值。

我试图使用xml2库解析XML文件xml_name(doc)和xml_children(doc)都产生预期的输出但是当我尝试使用xml_find_all提取一些数据时它会返回{xml_nodeset(0)}我做错了什么?

library(xml2)

doc<-read_xml('<export xmlns="http://eu.europa.ec/fpi/fsd/export" generationDate="2018-06-15T19:29:31.078+02:00" globalFileId="117284">
    <sanctionEntity designationDetails="" unitedNationId="" euReferenceNumber="EU.36.64" logicalId="1">
        <regulation regulationType="amendment" organisationType="commission" publicationDate="2018-02-16" entryIntoForceDate="2018-02-16" numberTitle="2018/223 (OJ L43)" programme="ZWE" logicalId="110201">
            <publicationUrl>http://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32018R0223&amp;from=EN</publicationUrl>
        </regulation>
        <subjectType code="person" classificationCode="P"/>
        <nameAlias firstName="Robert" middleName="Gabriel" lastName="Mugabe" wholeName="Robert Gabriel Mugabe" function="Former President" gender="M" title="" nameLanguage="" strong="true" regulationLanguage="en" logicalId="1">
            <regulationSummary regulationType="amendment" publicationDate="2018-02-16" numberTitle="2018/223 (OJ L43)" publicationUrl="http://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32018R0223&amp;from=EN"/>
        </nameAlias>
        <birthdate circa="false" calendarType="GREGORIAN" city="" zipCode="" birthdate="1924-02-21" dayOfMonth="21" monthOfYear="2" year="1924" region="" place="" countryIso2Code="00" countryDescription="UNKNOWN" regulationLanguage="en" logicalId="1">
            <regulationSummary regulationType="amendment" publicationDate="2005-06-16" numberTitle="898/2005 (OJ L153)" publicationUrl="http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2005:153:0009:0014:EN:PDF"/>
        </birthdate>
        <identification diplomatic="false" knownExpired="false" knownFalse="false" reportedLost="false" revokedByIssuer="false" issuedBy="" latinNumber="" nameOnDocument="" number="AD001095" region="" countryIso2Code="00" countryDescription="UNKNOWN" identificationTypeCode="passport" identificationTypeDescription="National passport" regulationLanguage="en" logicalId="315">
            <remark>(passport)</remark>
            <regulationSummary regulationType="amendment" publicationDate="2012-02-22" numberTitle="151/2012 (OJ L49)" publicationUrl="http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2012:049:0002:0016:EN:PDF"/>
        </identification>
    </sanctionEntity>
    <sanctionEntity designationDate="2002-02-21" designationDetails="" unitedNationId="" euReferenceNumber="EU.36.64" logicalId="1">
        <remark>Date of designation referred to in Article 7 (2): 21.2.2002.</remark>
        <regulation regulationType="amendment" organisationType="commission" publicationDate="2002-09-13" entryIntoForceDate="2002-09-13" numberTitle="1643/2002 (OJ L247)" programme="ZWE" logicalId="1296">
            <publicationUrl>http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2002:247:0022:0024:EN:PDF</publicationUrl>
        </regulation>
        <subjectType code="person" classificationCode="P"/>
        <nameAlias firstName="Robert" middleName="Gabriel" lastName="Mugabe" wholeName="Robert Gabriel Mugabe" function="president" gender="M" title="" nameLanguage="" strong="true" regulationLanguage="en" logicalId="101396">
            <regulationSummary regulationType="amendment" publicationDate="2002-09-13" numberTitle="1643/2002 (OJ L247)" publicationUrl="http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2002:247:0022:0024:EN:PDF"/>
        </nameAlias>
        <birthdate circa="false" calendarType="GREGORIAN" city="Kutama" zipCode="" birthdate="1924-02-21" dayOfMonth="21" monthOfYear="2" year="1924" region="" place="" countryIso2Code="00" countryDescription="UNKNOWN" regulationLanguage="en" logicalId="101395">
            <regulationSummary regulationType="amendment" publicationDate="2002-09-13" numberTitle="1643/2002 (OJ L247)" publicationUrl="http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2002:247:0022:0024:EN:PDF"/>
        </birthdate>
    </sanctionEntity>
    </export>
')

xml_name(doc)
xml_children(doc)

xml_find_all(doc, ".//sanctionEntity")

xml_find_all(doc, ".//publicationDate")
答案

尝试使用:

xml_find_all(doc, ".//*[name()='sanctionEntity']") 
{xml_nodeset (2)}
[1] <sanctionEntity designationDetails="" unitedNationId="" euReferenceNumber="EU.36.64" logicalId="1">
   ...
[2] <sanctionEntity designationDate="2002-02-21" designationDetails="" unitedNationId="" euReferenceNumber= ..

xml_find_all(doc, ".//*[name()='sanctionEntity']") %>% map(xml_find_all, "./*")
[[1]]
{xml_nodeset (5)}
[1] <regulation regulationType="amendment" organisationType="commission" publicationDate="2018-02-16" entry ...
[2] <subjectType code="person" classificationCode="P"/>
[3] <nameAlias firstName="Robert" middleName="Gabriel" lastName="Mugabe" wholeName="Robert Gabriel Mugabe"  ...
[4] <birthdate circa="false" calendarType="GREGORIAN" city="" zipCode="" birthdate="1924-02-21" dayOfMonth= ...
[5] <identification diplomatic="false" knownExpired="false" knownFalse="false" reportedLost="false" revoked ...

[[2]]
{xml_nodeset (5)}
[1] <remark>Date of designation referred to in Article 7 (2): 21.2.2002.</remark>
[2] <regulation regulationType="amendment" organisationType="commission" publicationDate="2002-09-13" entry ...
[3] <subjectType code="person" classificationCode="P"/>
[4] <nameAlias firstName="Robert" middleName="Gabriel" lastName="Mugabe" wholeName="Robert Gabriel Mugabe"  ...
[5] <birthdate circa="false" calendarType="GREGORIAN" city="Kutama" zipCode="" birthdate="1924-02-21" dayOf ...

以上是关于使用xml2解析xml的主要内容,如果未能解决你的问题,请参考以下文章

在 R 中使用 XML2 解析大 XML

R爬虫总结 | RCurl/httr(请求)→XML/xml2/rvest(解析)

R使用XML2将数据从XML提取到数据帧中

从Angular 5更新到6后,我不断收到错误:无法解析xml2js中的计时器

为 API 调用 Xml2Json 制作 Json 模型

使用 $.xml2Json 返回一个复杂的对象