使用 powershell 将嵌套的复杂 XML 解析为 CSV
Posted
技术标签:
【中文标题】使用 powershell 将嵌套的复杂 XML 解析为 CSV【英文标题】:Parsing nested complex XML to CSV using powershell 【发布时间】:2022-01-17 08:31:27 【问题描述】:早安,
我是 powershell 新手,我正在尝试将复杂的 xml 解析为 CSV:
这里是xml代码
<LoyaltyCustomer Action="E">
<Retailer Id="1">
<HouseHold BuyingUnitInternalKey="2" HouseHoldExternalId="-1" SendEmail="false">
<Members>
<Member MemberInternalKey="2" MemberExternalId="-1" IsMainMember="true" LastName="Internal" FirstName="Use" StartDate="2012-10-02T12:42:00" RedemptionPrivileges="0" MemberStatus="0" AdressNormalizationUpdate="N">
<Cards>
<Card Id="-1" CardStatus="" IssueDate="2012-10-02T12:42:00" ExpirationDate="2056-12-31T00:00:00" />
</Cards>
<Stores>
<Store Id="26" StoreTypeId="1" IsHomeStore="true" />
</Stores>
</Member>
</Members>
</HouseHold>
<HouseHold BuyingUnitInternalKey="3" HouseHoldExternalId="244003000001" Country="11" State="223" City="Calgary" Street1="Main St" StreetNum="203" PostalCode="R4C 3R1" POBox="999999" HomePhone="438-439-1246" EMailAddress="zabana@zabana.ca" SendEmail="true">
<Accounts>
<Account Id="1" EarnValue="6422.0000" RedeemValue="6049.0000" Balance="373.0000" LastUpdate="2013-05-10" HousekeepingBalance="0.0000" HousekeepingTotalAccumulated="0.0000" HousekeepingTotalRedeemed="0.0000" />
<Account Id="3" EarnValue="10.0000" RedeemValue="8.0000" Balance="2.0000" LastUpdate="2013-02-20" HousekeepingBalance="0.0000" HousekeepingTotalAccumulated="0.0000" HousekeepingTotalRedeemed="0.0000" />
<Account Id="4" EarnValue="10.0000" RedeemValue="7.0000" Balance="3.0000" LastUpdate="2013-02-20" HousekeepingBalance="0.0000" HousekeepingTotalAccumulated="0.0000" HousekeepingTotalRedeemed="0.0000" />
<Account Id="5" EarnValue="8.0000" RedeemValue="5.0000" Balance="3.0000" LastUpdate="2013-04-18" HousekeepingBalance="0.0000" HousekeepingTotalAccumulated="0.0000" HousekeepingTotalRedeemed="0.0000" />
<Account Id="6" EarnValue="9.0000" RedeemValue="8.0000" Balance="1.0000" LastUpdate="2013-02-20" HousekeepingBalance="0.0000" HousekeepingTotalAccumulated="0.0000" HousekeepingTotalRedeemed="0.0000" />
<Account Id="7" EarnValue="7028.0000" RedeemValue="6500.0000" Balance="528.0000" LastUpdate="2017-07-10" HousekeepingBalance="0.0000" HousekeepingTotalAccumulated="0.0000" HousekeepingTotalRedeemed="0.0000" />
<Account Id="8" EarnValue="269319.0000" RedeemValue="269000.0000" Balance="319.0000" LastUpdate="2019-07-10" HousekeepingBalance="0.0000" HousekeepingTotalAccumulated="0.0000" HousekeepingTotalRedeemed="0.0000" />
<Account Id="10" EarnValue="688968.0000" RedeemValue="682641.0000" Balance="6327.0000" LastUpdate="2019-07-10" HousekeepingBalance="0.0000" HousekeepingTotalAccumulated="0.0000" HousekeepingTotalRedeemed="0.0000" />
</Accounts>
<HouseHoldSegments>
<Segment Id="100" Status="1" AttachmentSourceId="1" />
<Segment Id="500" Status="1" AttachmentSourceId="4" />
<Segment Id="502" Status="1" AttachmentSourceId="8" />
<Segment Id="531" Status="1" AttachmentSourceId="1" />
</HouseHoldSegments>
<Members>
<Member MemberInternalKey="3" MemberExternalId="244003000001" IsMainMember="true" LastName="zabana" FirstName="Mike" BirthDate="1970-11-04" DriversLicense="drvlic" NationalInsuranceNumber="socsec" Remarks="Test Account" MobilePhoneNumber="438-439-1246" Gender="1" Title="1" StartDate="2013-01-21T14:28:00" EffectiveDate="2013-02-08T13:58:00" RedemptionPrivileges="0" LanguageId="0" NumberOfFamilyMembers="4" Anonimity="0" MemberStatus="1" ReceiptLayoutId="1" AdressNormalizationUpdate="N" UpdatedDate="2019-07-20T14:47:00" CommercialDriversLicense="comdrvlic">
<Cards>
<Card Id="244003000001" CardStatus="1" IssueDate="2013-01-21T00:00:00" ExpirationDate="2056-12-31T23:59:00" />
</Cards>
<Stores>
<Store Id="24" StoreTypeId="1" IsHomeStore="true" />
</Stores>
<MemberAttributes>
<Attribute Id="10004" Value="zabana MANAGEMENTS" />
</MemberAttributes>
<MemberAdditionalAddress />
<FamilyMembers>
<FamilyMember FamilyMemberId="1" TypeId="2" Name="Kathy" />
<FamilyMember FamilyMemberId="2" TypeId="1" Name="Melissa" BirthdayDate="1997-06-06" Gender="2" />
<FamilyMember FamilyMemberId="3" TypeId="1" Name="Trent" BirthdayDate="2000-08-29" Gender="1" />
</FamilyMembers>
</Member>
<Member MemberInternalKey="23612" MemberExternalId="244003000002" IsMainMember="false" LastName="zabana" FirstName="Kathy" BirthDate="1970-07-02" Remarks="Test Account for Family testing" Gender="2" Title="3" StartDate="2013-03-11T13:16:00" RedemptionPrivileges="0" LanguageId="0" PostOption="1" NumberOfFamilyMembers="1" Anonimity="0" MemberStatus="1" AdressNormalizationUpdate="N" UpdatedDate="2019-07-15T19:44:00">
</Member>
<Member MemberInternalKey="33421" MemberExternalId="244003000003" IsMainMember="false" LastName="zabana" FirstName="Trent" BirthDate="2000-08-29" Remarks="Test account to see how a different address and email affect the member export" Gender="1" Title="1" StartDate="2013-03-27T14:41:00" EffectiveDate="2017-06-26T13:55:00" RedemptionPrivileges="3" LanguageId="0" PostOption="1" NumberOfFamilyMembers="1" Anonimity="0" MemberStatus="1" AdressNormalizationUpdate="N" UpdatedDate="2019-07-05T18:07:00">
</Member>
</Members>
</HouseHold>
</Retailer>
</LoyaltyCustomer>
我想要得到的是这个输出:
HouseHoldExternalId IsMainMember MemberExternalId FirstName LastName StartDate Id Balance
------------------- ------------ ---------------- --------- -------- -------------- -- -------
-1 TRUE -1 Use Internal 10/2/2012 12:42
244003000001 TRUE 244003000001 Mike Zabana 1/21/2013 14:28
244003000001 FALSE 244003000002 Kathy zabana 3/11/2013 13:16
244003000001 FALSE 244003000003 Trent zabana 3/27/2013 14:41
244003000001 1 373
244003000001 3 2
244003000001 4 3
244003000001 5 3
244003000001 6 1
244003000001 7 528
244003000001 8 319
244003000001 10 6327
我在 PowerShell 中使用以下脚本:
$data = New-Object xml;
$data.load("C:\temp\parsing\mehdi.xml")
$mehdi = [pscustomobject]@
"HouseHoldExternalId" = $data.SelectNodes("/LoyaltyCustomer/Retailer/HouseHold").HouseHoldExternalId |Out-String
"MemberExternalId" = $data.SelectNodes("/LoyaltyCustomer/Retailer/HouseHold/Members/Member").MemberExternalId |Out-String
"IsMainMember" = $data.SelectNodes("/LoyaltyCustomer/Retailer/HouseHold/Members/Member").IsMainMember |Out-String
"LastName" = $data.SelectNodes("/LoyaltyCustomer/Retailer/HouseHold/Members/Member").LastName |Out-String
"FirstName" = $data.SelectNodes("/LoyaltyCustomer/Retailer/HouseHold/Members/Member").FirstName |Out-String
"StartDate" = $data.SelectNodes("/LoyaltyCustomer/Retailer/HouseHold/Members/Member").StartDate |Out-String
"Id" = $data.SelectNodes("/LoyaltyCustomer/Retailer/HouseHold/Accounts/Account").Id |Out-String
"Balance" = $data.SelectNodes("/LoyaltyCustomer/Retailer/HouseHold/Accounts/Account").Balance |Out-String
$mehdi | ConvertTo-Csv -NoTypeInformation -Delimiter ";" | Set-Content -Path C:\temp\parsing\test2.csv -Encoding UTF8
我得到的是所有数据都在 HouseholdExternalID 列下。
我怎样才能获得相同的输出?我将感谢您的帮助和建议。谢谢
【问题讨论】:
拜托,format your code 并添加 XML 的 text 表示和脚本的预期结果 (CSV)。不要使用屏幕截图。 请添加 CSV 应如何纯文本的最小表示。 对不起圣地亚哥,我按照要求添加了 XML 的文本表示,CSV 和 PowerShell 脚本中的预期输出。谢谢 【参考方案1】:似乎这应该可以将您的 XML 转换为 object[]
,之后您可以将其导出为 CSV。基本上,您的预期输出是 3 个数组的组合:
$xml.LoyaltyCustomer.Retailer.Household
获取 HouseHoldExternalId
$xml.LoyaltyCustomer.Retailer.Household.Members.Member
获取 IsMainMember、MemberExternalId、FirstName、LastName 和 StartDate
$xml.LoyaltyCustomer.Retailer.Household.Accounts.Account
获取Id和Balance
$xml = [xml]::new()
$xml.Load("C:\temp\parsing\mehdi.xml")
$psobjectOut =
param($element, $subelement)
[pscustomobject]@
HouseHoldExternalId = $element.HouseHoldExternalId
IsMainMember = $subelement.IsMainMember
MemberExternalId = $subelement.MemberExternalId
FirstName = $subelement.FirstName
LastName = $subelement.LastName
StartDate = '0' -f $subelement.StartDate -as [datetime]
Id = $subelement.Id
Balance = $subelement.Balance
$result = foreach($element in $xml.LoyaltyCustomer.Retailer.Household)
foreach($subelement in $element.Members.Member)
& $psobjectOut -element $element -subelement $subelement
foreach($subelement in $element.Accounts.Account)
& $psobjectOut -element $element -subelement $subelement
检查$result
:
PS /> $result | Format-Table
HouseHoldExternalId IsMainMember MemberExternalId FirstName LastName StartDate Id Balance
------------------- ------------ ---------------- --------- -------- --------- -- -------
-1 true -1 Use Internal 10/2/2012 12:42:00 PM
244003000001 true 244003000001 Mike zabana 1/21/2013 2:28:00 PM
244003000001 false 244003000002 Kathy zabana 3/11/2013 1:16:00 PM
244003000001 false 244003000003 Trent zabana 3/27/2013 2:41:00 PM
244003000001 1 373.0000
244003000001 3 2.0000
244003000001 4 3.0000
244003000001 5 3.0000
244003000001 6 1.0000
244003000001 7 528.0000
244003000001 8 319.0000
244003000001 10 6327.0000
【讨论】:
谢谢圣地亚哥,它按预期工作。感谢您的帮助。 @MehdiC 乐于提供帮助 :)【参考方案2】:添加到Santiago's fine answer 我想使用 xPath 来攻击它。因此,我使用 an 建立了他的工作并提出了以下调整 t
[XML]$XML = [XML]( Get-Content 'C:\Scripts\Example.xml' )
$HouseHolds = Select-Xml -Xml $Xml -XPath '/LoyaltyCustomer/Retailer/HouseHold'
$Output =
For($i = 0; $i -lt $HouseHolds.Count; ++$i )
$HouseHoldID = $HouseHolds[$i].Node.HouseHoldExternalId
$HouseHoldXpath = "/LoyaltyCustomer/Retailer/HouseHold[$($i+1)]"
Select-Xml -Xml $XML -XPath "$HouseHoldXpath/Members/Member|$HouseHoldXpath/Accounts/Account" |
ForEach-Object
[PSCustomObject]@
HouseHoldExternalId = $HouseHoldID
IsMainMember = $_.Node.IsMainMember
MemberExternalId = $_.Node.MemberExternalId
FirstName = $_.Node.FirstName
LastName = $_.Node.LastName
StartDate = $_.Node.StartDate -as [DateTime]
Id = $_.Node.Id
Balance = $_.Node.Balance
$Output | Format-Table -AutoSize
返回:
HouseHoldExternalId IsMainMember MemberExternalId FirstName LastName StartDate Id Balance
------------------- ------------ ---------------- --------- -------- --------- -- -------
-1 true -1 Use Internal 10/2/2012 12:42:00 PM
244003000001 1 373.0000
244003000001 3 2.0000
244003000001 4 3.0000
244003000001 5 3.0000
244003000001 6 1.0000
244003000001 7 528.0000
244003000001 8 319.0000
244003000001 10 6327.0000
244003000001 true 244003000001 Mike zabana 1/21/2013 2:28:00 PM
244003000001 false 244003000002 Kathy zabana 3/11/2013 1:16:00 PM
244003000001 false 244003000003 Trent zabana 3/27/2013 2:41:00 PM
由于组合了 xPath 语句,因此只需要 1 个内部循环。因此,实际上并不需要预先打包表达式以输出 PSCustomObject。也就是说,排序有点偏离,可能是由于 xPath 查询的运行方式。所有的结果都在那里。
较早的版本没有如此积极地使用 xPath,但它的顺序是正确的:
[XML]$XML = [XML]( Get-Content 'C:\Scripts\Example.xml' )
$HouseHolds = Select-Xml -Xml $Xml -XPath '/LoyaltyCustomer/Retailer/HouseHold'
$ObjectExpression =
[PSCustomObject]@
HouseHoldExternalId = $HouseHoldID
IsMainMember = $_.Node.IsMainMember
MemberExternalId = $_.Node.MemberExternalId
FirstName = $_.Node.FirstName
LastName = $_.Node.LastName
StartDate = $_.Node.StartDate -as [DateTime]
Id = $_.Node.Id
Balance = $_.Node.Balance
$Output =
For($i = 0; $i -lt $HouseHolds.Count; ++$i )
$HouseHoldID = $HouseHolds[$i].Node.HouseHoldExternalId
$HouseHoldXpath = "01" -f $HouseHolds[$i].Pattern, "[$($i+1)]"
Select-Xml -Xml $XML -XPath "$HouseHoldXpath/Members/Member" |
ForEach-Object & $ObjectExpression
Select-Xml -Xml $XML -XPath "$HouseHoldXpath/Accounts/Account" |
ForEach-Object & $ObjectExpression
$Output | Format-Table -AutoSize
【讨论】:
以上是关于使用 powershell 将嵌套的复杂 XML 解析为 CSV的主要内容,如果未能解决你的问题,请参考以下文章
powershell 将嵌套文件移动到顶级目录#Snippet
使用 Powershell .Net 方法将 XML 转换为 HTML