使用OpenXML操作Office文档

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了使用OpenXML操作Office文档相关的知识,希望对你有一定的参考价值。

使用OpenXML类库, 以编程的方式来访问PowerPoint, Word, Excel等文档, 有时能够为我们批量编辑文档提供方便。

最近项目中遇到的两个任务是: 1. 替换文档中的图片的Alt Text信息。2. 替换文档中超级链接的ScreenTip信息。 这里的文档是PPT和Word。如果要手动打开Word文档, 然后一个一个图片进行替换, 挺浪费时间的, 其次, Word也没有提供图片Alt Text的查找替换功能, 所以, 就想编程实现批量查找和替换。

 

首先, 安装OpenXML SDK, 通过Nuget Manager.

Install-Package DocumentFormat.OpenXml 


然后加入对命名空间的引用。

using DocumentFormat.OpenXml.Office;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Presentation;
using DocumentFormat.OpenXml.Wordprocessing;
using DocumentFormat.OpenXml.Drawing.Wordprocessing;

在Word中, Image图片的存储XML文档格式如下:

        <w:drawing>
          <wp:inline distT="0" distB="0" distL="0" distR="0">
            <wp:extent cx="2743438" cy="2792210"/>
            <wp:effectExtent l="0" t="0" r="0" b="8255"/>
            <wp:docPr id="1" name="Picture 1" descr="this is alt text description of horse icon." title="title of horse Icon"/>
            <wp:cNvGraphicFramePr>
              <a:graphicFrameLocks xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" noChangeAspect="1"/>
            </wp:cNvGraphicFramePr>
            <a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
              <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
                <pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
                  <pic:nvPicPr>
                    <pic:cNvPr id="1" name="1.png"/>
                    <pic:cNvPicPr/>
                  </pic:nvPicPr>
                  <pic:blipFill>
                    <a:blip r:embed="rId6">
                      <a:extLst>
                        <a:ext uri="{28A0092B-C50C-407E-A947-70E740481C1C}">
                          <a14:useLocalDpi xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main" val="0"/>
                        </a:ext>
                      </a:extLst>
                    </a:blip>
                    <a:stretch>
                      <a:fillRect/>
                    </a:stretch>
                  </pic:blipFill>
                  <pic:spPr>
                    <a:xfrm>
                      <a:off x="0" y="0"/>
                      <a:ext cx="2743438" cy="2792210"/>
                    </a:xfrm>
                    <a:prstGeom prst="rect">
                      <a:avLst/>
                    </a:prstGeom>
                  </pic:spPr>
                </pic:pic>
              </a:graphicData>
            </a:graphic>
          </wp:inline>
        </w:drawing>

Word实现查找图片Alt Text的代码:

            using (WordprocessingDocument doc = WordprocessingDocument.Open(file, false))
            {
                MainDocumentPart mainPart = doc.MainDocumentPart;

                StringBuilder sb = new StringBuilder();

                
                var imageParts = mainPart.ImageParts;
                int imageIndex = 0;
                foreach (var image in imageParts)
                {
                    imageIndex++;
                    string id = mainPart.GetIdOfPart(image);
                    var drawings = mainPart.Document.Body.Descendants<Drawing>();
                    string title = string.Empty;
                    string description = string.Empty;
                    foreach (var drawing in drawings)
                    {
                        if (drawing.Descendants<DocumentFormat.OpenXml.Drawing.Blip>().FirstOrDefault() != null &&
                            drawing.Descendants<DocumentFormat.OpenXml.Drawing.Blip>().FirstOrDefault().Embed == id)
                        {
                            title = drawing.Descendants<Inline>().First().DocProperties.Title != null ? drawing.Descendants<Inline>().First().DocProperties.Title.ToString() : string.Empty;
                            description = drawing.Descendants<Inline>().First().DocProperties.Description != null ? drawing.Descendants<Inline>().First().DocProperties.Description.ToString() : string.Empty;
                        }
                    }

                    sb.AppendLine(string.Format("{0}: {1}={2}", imageIndex, title, description));
                }

PPT实现图片Alt Text的查找代码如下:

using (PresentationDocument doc = PresentationDocument.Open(file, false))
            {
                StringBuilder sb = new StringBuilder();

                int sdIndex = 0;
                foreach (var part in doc.PresentationPart.SlideParts)
                {
                    sdIndex++;
                    int picIndex = 0;
                    var imageParts = part.GetPartsOfType<ImagePart>();
                    foreach (var imagePart in imageParts)
                    {
                        picIndex++;
                        var picture = part.Slide.Descendants<DocumentFormat.OpenXml.Presentation.Picture>().Where(p =>
                            p.BlipFill.Blip.Embed == part.GetIdOfPart(imagePart)).FirstOrDefault();
                        var title = picture.NonVisualPictureProperties.NonVisualDrawingProperties.Title;
                        var description = picture.NonVisualPictureProperties.NonVisualDrawingProperties.Description;
                        sb.AppendLine(string.Format("{0}-{1}: {2}={3}", sdIndex, picIndex, title, description));
                    }
                }

                textToReturn = sb.ToString();                
            }

 

在Word中, 超级链接的XML代码是:

      <w:hyperlink r:id="rId7" w:tooltip="Baidu Company" w:history="1">
        <w:r w:rsidR="00007220" w:rsidRPr="00FC5FDE">
          <w:rPr>
            <w:rStyle w:val="Hyperlink"/>
          </w:rPr>
          <w:t>www.baidu.com</w:t>
        </w:r>
      </w:hyperlink>

word实现超链接ScreenTip的查找:

            using (WordprocessingDocument doc = WordprocessingDocument.Open(file, false))
            {
                MainDocumentPart mainPart = doc.MainDocumentPart;
                var hyperlinks = mainPart.Document.Body.Descendants<Hyperlink>();

                StringBuilder sb = new StringBuilder();

                int hlIndex = 0;
                foreach (var hyperlink in hyperlinks)
                {
                    hlIndex++;
                    string url = string.Empty;

                    var hyperlinkRelationships = mainPart.HyperlinkRelationships;
                    foreach (var item in hyperlinkRelationships)
                    {
                        if (item.Id == hyperlink.Id)
                        {
                            url = item.Uri.OriginalString;
                            break;
                        }
                    }

                    string toolTip = hyperlink.Tooltip;
                    sb.AppendLine(string.Format("{0}: {1}={2}", hlIndex, url, toolTip));
                }

                textToReturn = sb.ToString();
                
            }

PPT实现超链接ScreenTip的查找:

            using (PresentationDocument doc = PresentationDocument.Open(file, false))
            {
                StringBuilder sb = new StringBuilder();

                int sdIndex = 0;
                foreach (var part in doc.PresentationPart.SlideParts)
                {
                    sdIndex++;
                    var hyperLinks = part.Slide.Descendants<DocumentFormat.OpenXml.Drawing.HyperlinkType>();
                    int hlIndex = 0;
                    foreach (var hyperLink in hyperLinks)
                    {
                        hlIndex++;
                        string url = string.Empty;
                        foreach (var item in part.HyperlinkRelationships)
                        {
                            if (item.Id == hyperLink.Id)
                            {
                                url = item.Uri.Authority;
                                break;
                            }
                        }
                        var tooTip = hyperLink.Tooltip;

                        sb.AppendLine(string.Format("{0}-{1}: {2}={3}", sdIndex, hlIndex, url, tooTip));
                    }

                }

                textToReturn = sb.ToString();
            }

 

以上是关于使用OpenXML操作Office文档的主要内容,如果未能解决你的问题,请参考以下文章

如何使用openxml C#在word文档中添加形状?

使用 openxml 将标题添加到 docx

C# dotnet 使用 OpenXml 关闭时不自动保存文档方法

dotnet OpenXML 利用合并表格单元格在 PPT 文档插入不可见的额外版权信息

dotnet OpenXML 利用合并表格单元格在 PPT 文档插入不可见的额外版权信息

dotnet OpenXML 利用合并表格单元格在 PPT 文档插入不可见的额外版权信息