使用OpenXML操作Office文档
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了使用OpenXML操作Office文档相关的知识,希望对你有一定的参考价值。
使用OpenXML类库, 以编程的方式来访问PowerPoint, Word, Excel等文档, 有时能够为我们批量编辑文档提供方便。
最近项目中遇到的两个任务是: 1. 替换文档中的图片的Alt Text信息。2. 替换文档中超级链接的ScreenTip信息。 这里的文档是PPT和Word。如果要手动打开Word文档, 然后一个一个图片进行替换, 挺浪费时间的, 其次, Word也没有提供图片Alt Text的查找替换功能, 所以, 就想编程实现批量查找和替换。
首先, 安装OpenXML SDK, 通过Nuget Manager.
Install-Package DocumentFormat.OpenXml
然后加入对命名空间的引用。
using DocumentFormat.OpenXml.Office; using DocumentFormat.OpenXml.Packaging; using DocumentFormat.OpenXml.Presentation; using DocumentFormat.OpenXml.Wordprocessing; using DocumentFormat.OpenXml.Drawing.Wordprocessing;
在Word中, Image图片的存储XML文档格式如下:
<w:drawing> <wp:inline distT="0" distB="0" distL="0" distR="0"> <wp:extent cx="2743438" cy="2792210"/> <wp:effectExtent l="0" t="0" r="0" b="8255"/> <wp:docPr id="1" name="Picture 1" descr="this is alt text description of horse icon." title="title of horse Icon"/> <wp:cNvGraphicFramePr> <a:graphicFrameLocks xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" noChangeAspect="1"/> </wp:cNvGraphicFramePr> <a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"> <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture"> <pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture"> <pic:nvPicPr> <pic:cNvPr id="1" name="1.png"/> <pic:cNvPicPr/> </pic:nvPicPr> <pic:blipFill> <a:blip r:embed="rId6"> <a:extLst> <a:ext uri="{28A0092B-C50C-407E-A947-70E740481C1C}"> <a14:useLocalDpi xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main" val="0"/> </a:ext> </a:extLst> </a:blip> <a:stretch> <a:fillRect/> </a:stretch> </pic:blipFill> <pic:spPr> <a:xfrm> <a:off x="0" y="0"/> <a:ext cx="2743438" cy="2792210"/> </a:xfrm> <a:prstGeom prst="rect"> <a:avLst/> </a:prstGeom> </pic:spPr> </pic:pic> </a:graphicData> </a:graphic> </wp:inline> </w:drawing>
Word实现查找图片Alt Text的代码:
using (WordprocessingDocument doc = WordprocessingDocument.Open(file, false)) { MainDocumentPart mainPart = doc.MainDocumentPart; StringBuilder sb = new StringBuilder(); var imageParts = mainPart.ImageParts; int imageIndex = 0; foreach (var image in imageParts) { imageIndex++; string id = mainPart.GetIdOfPart(image); var drawings = mainPart.Document.Body.Descendants<Drawing>(); string title = string.Empty; string description = string.Empty; foreach (var drawing in drawings) { if (drawing.Descendants<DocumentFormat.OpenXml.Drawing.Blip>().FirstOrDefault() != null && drawing.Descendants<DocumentFormat.OpenXml.Drawing.Blip>().FirstOrDefault().Embed == id) { title = drawing.Descendants<Inline>().First().DocProperties.Title != null ? drawing.Descendants<Inline>().First().DocProperties.Title.ToString() : string.Empty; description = drawing.Descendants<Inline>().First().DocProperties.Description != null ? drawing.Descendants<Inline>().First().DocProperties.Description.ToString() : string.Empty; } } sb.AppendLine(string.Format("{0}: {1}={2}", imageIndex, title, description)); }
PPT实现图片Alt Text的查找代码如下:
using (PresentationDocument doc = PresentationDocument.Open(file, false)) { StringBuilder sb = new StringBuilder(); int sdIndex = 0; foreach (var part in doc.PresentationPart.SlideParts) { sdIndex++; int picIndex = 0; var imageParts = part.GetPartsOfType<ImagePart>(); foreach (var imagePart in imageParts) { picIndex++; var picture = part.Slide.Descendants<DocumentFormat.OpenXml.Presentation.Picture>().Where(p => p.BlipFill.Blip.Embed == part.GetIdOfPart(imagePart)).FirstOrDefault(); var title = picture.NonVisualPictureProperties.NonVisualDrawingProperties.Title; var description = picture.NonVisualPictureProperties.NonVisualDrawingProperties.Description; sb.AppendLine(string.Format("{0}-{1}: {2}={3}", sdIndex, picIndex, title, description)); } } textToReturn = sb.ToString(); }
在Word中, 超级链接的XML代码是:
<w:hyperlink r:id="rId7" w:tooltip="Baidu Company" w:history="1"> <w:r w:rsidR="00007220" w:rsidRPr="00FC5FDE"> <w:rPr> <w:rStyle w:val="Hyperlink"/> </w:rPr> <w:t>www.baidu.com</w:t> </w:r> </w:hyperlink>
word实现超链接ScreenTip的查找:
using (WordprocessingDocument doc = WordprocessingDocument.Open(file, false)) { MainDocumentPart mainPart = doc.MainDocumentPart; var hyperlinks = mainPart.Document.Body.Descendants<Hyperlink>(); StringBuilder sb = new StringBuilder(); int hlIndex = 0; foreach (var hyperlink in hyperlinks) { hlIndex++; string url = string.Empty; var hyperlinkRelationships = mainPart.HyperlinkRelationships; foreach (var item in hyperlinkRelationships) { if (item.Id == hyperlink.Id) { url = item.Uri.OriginalString; break; } } string toolTip = hyperlink.Tooltip; sb.AppendLine(string.Format("{0}: {1}={2}", hlIndex, url, toolTip)); } textToReturn = sb.ToString(); }
PPT实现超链接ScreenTip的查找:
using (PresentationDocument doc = PresentationDocument.Open(file, false)) { StringBuilder sb = new StringBuilder(); int sdIndex = 0; foreach (var part in doc.PresentationPart.SlideParts) { sdIndex++; var hyperLinks = part.Slide.Descendants<DocumentFormat.OpenXml.Drawing.HyperlinkType>(); int hlIndex = 0; foreach (var hyperLink in hyperLinks) { hlIndex++; string url = string.Empty; foreach (var item in part.HyperlinkRelationships) { if (item.Id == hyperLink.Id) { url = item.Uri.Authority; break; } } var tooTip = hyperLink.Tooltip; sb.AppendLine(string.Format("{0}-{1}: {2}={3}", sdIndex, hlIndex, url, tooTip)); } } textToReturn = sb.ToString(); }
以上是关于使用OpenXML操作Office文档的主要内容,如果未能解决你的问题,请参考以下文章
C# dotnet 使用 OpenXml 关闭时不自动保存文档方法
dotnet OpenXML 利用合并表格单元格在 PPT 文档插入不可见的额外版权信息