无法使用 Python 将 EMF 插入 Word

Posted 2023-03-29

技术标签:

【中文标题】无法使用 Python 将 EMF 插入 Word【英文标题】：Unable to insert EMF into Word using Python 【发布时间】：2021-08-12 18:17:00 【问题描述】：

我需要将 SVG 文件插入 Word。因为，我们不能直接这样做，所以我打算将 SVG 转换为 EMF 并插入它。使用 inkscape 将 SVG 转换为 EMF 效果很好。但是，我无法想出将其插入 Word 的正确代码。我按照this post 中的 Alvaro 人解释的步骤进行操作。已显示附件中的步骤 -

这是我的代码 -

但是，当我运行附件中显示的代码时 - 它仍然抛出 docx.image.exceptions.UnrecognizedImageError。 github 上的库的贡献者声称该库解决了这个问题。如果是这样，请告诉我是否遗漏任何内容。

我可以手动成功插入 EMF 文件。通过插入 EMF 来附加文档。此 EMF 是从 the internet 下载的，用于测试。

【问题讨论】：

看起来问题出在图像中。 EMF 格式因其故障而臭名昭著。可能有些图像可以这样处理，但有些则不能。您可以手动将图像粘贴到 DOCX 文件中吗？ DOCX 文件基本上是一个带有 xml 和媒体文件的 zip 存档。您可以手动制作 DOCX 文件、保存、解压缩并查看您的 EMF 是如何存储的。然后找出如何对另一个文件做类似的事情。 @YuriKhristich - 感谢您的回复。我能够将 EMF 文件插入 Word。刚刚更新了我附加文档的问题。请看一下，让我知道。 @YuriKhristich-对不起，你是对的。我错过了 *** 不允许我们上传文件。有什么方法可以和你分享吗？如果您共享您的电子邮件地址，可以邮寄给您。刚刚在这里上传了文档 - file.io/29Vhavlq5Mak 这是 EMF 文件 - file.io/9n8QTizYYlT2 【参考方案1】：

似乎docx 模块不适用于EMF 文件。

我的意思是解决方法在这里：

import shutil
import zipfile

temp_dir = "_temp"

old_docx = "doc.docx"
new_docx = "doc_new.docx"

old_emf = temp_dir + "/word/media/image1.emf"
new_emf = "new_image.emf"


# unpack content of the docx file into the temp folder

with zipfile.ZipFile(old_docx, "r") as z:
    files = z.namelist()
    for f in files: z.extract(f, temp_dir)


# replace the image

shutil.copyfile(new_emf, old_emf)


# pack all files from temp folder back into the new docx file

with zipfile.ZipFile(new_docx, "a") as z:
    for f in files: z.write(temp_dir + "/" + f, f)


# remove the temp folder

shutil.rmtree(temp_dir)

docx 文件的典型结构：

doc.docx
│
├─ [Content_Types].xml
│
├─ _rels
│  └─ .rels
│
├─ docProps
│  ├─ app.xml
│  └─ docProps
│
└─ word
   ├─ document.xml    <-- text is here
   ├─ fontTable.xml
   ├─ settings.xml
   ├─ webSettings.xml
   ├─ styles.xml
   │
   ├─ _rels
   │  └─ document.xml.rels
   │
   ├─ theme
   │  └─ theme1.xml
   │
   └─ media
      └─ image1.emf   <-- your image is here

它解压缩临时文件夹_temp 中的doc 文件doc.docx 的内容，然后用当前目录中的另一个文件new_image.emf 替换临时目录中的文件image1.emf。然后它将临时文件夹的内容打包回doc_new.docx 文件并删除临时目录。

注意：new_doc.docx 中的新图像将与旧图像具有相同的大小。

所以工作流程可以是这样的：你制作模板 docx 文件，手动放置模板 emf 图片并保存 docx 文件。然后获取新的 emf 图像，将图像放在 docx 文件旁边并运行脚本。这样你就可以得到一个带有新 emf 图像的新 docx 文件。

我想你有很多 emf 图像，所以在这个脚本中添加几行是有意义的，它能够拍摄多张图像并制作几个 docx 文件。

如果所有 emf 图像的大小相同，它将正常工作。如果它们具有不同的大小，则需要更多的编码来处理 xml 数据。

更新

我已经弄清楚如何获取 emf 图像的大小。所以这里是完整的解决方案：

from docx import Document
import shutil
import zipfile

temp_dir = "_temp"
old_docx = "doc.docx"
new_docx = "doc_new.docx"
old_emf  = temp_dir + "/word/media/image1.emf" # don't change this line
new_emf  = "img5.emf"

# unpack content of the docx file into temp folder
with zipfile.ZipFile(old_docx, "r") as z:
    files = z.namelist()
    for f in files: z.extract(f, temp_dir)

# replace the image
shutil.copyfile(new_emf, old_emf)

# pack all files from temp folder back into the new docx file
with zipfile.ZipFile(new_docx, "a") as z:
    for f in files: z.write(temp_dir + "/" + f, f)

# remove temp folder
shutil.rmtree(temp_dir)

# get sizes of the emf image
with open(new_emf, "rb") as f:
    f.read(16)
    w1, w2 = f.read(1).hex(), f.read(1).hex()
    f.read(2)
    h1, h2 = f.read(1).hex(), f.read(1).hex()

width  = int(str(w2) + str(w1), 16) * 762
height = int(str(h2) + str(h1), 16) * 762

# open the new docx file and set the sizes for the image
doc = Document(new_docx)
img = doc.inline_shapes[0]  # suppose the first image is the image
img.width  = width
img.height = height

doc.save(new_docx)

【讨论】：

谢谢，@Yuri 明天试试这个并告诉你我可以运行它，它会将 EMF 从 doc 传输到 doc_new。但是，请告诉我如何使用它？此脚本将 doc.docx 文件中的 EMF 图像更改为另一个 EMF 图像（取自当前文件夹）并保存名为 doc_new.docx 的新 docx 文件。如何使用它取决于您。可能，我不知道，如果您有数百个 emf 文件，您可以将它们放在一个文件夹中，并且（在对该脚本进行最小修改后）制作数百个 docx 文件，每个文件都将包含一个相关的 EMF 图像。但是你还没有提到你想做什么。我的主要目标是将 SVG 文件插入 Word。我们使用 SVG 是因为当分辨率改变时图像不会失真。此图像将主要包含我的团队生成的一些图。这是一个示例 - file.io/IotKrLUtqdUY 由于我们找不到任何插入 SVG 的方法，我们计划使用 incscape 将 SVG 转换为 EMF，并希望构建一个将所有绘图插入到 word 文档中的脚本。如果您对处理我们的要求有任何想法，请告诉我。查看我基于win32com模块的另一个解决方案【参考方案2】：

这是基于win32com模块和MS Word API的另一种解决方案：

from pathlib import Path
import win32com.client

cur_dir  = Path.cwd()                                   # get current folder
pictures = list((cur_dir / "pictures").glob("*.emf"))   # get a list of pictures
word_app = win32com.client.Dispatch("Word.Application") # run Word
doc      = word_app.Documents.Add()                     # create a new docx file

for pict in pictures:                                   # insert all pictures
    doc.InlineShapes.AddPicture(pict)

doc.SaveAs(str(cur_dir / "pictures.docx"))              # save the docx file
doc.Close()                                             # close docx
word_app.Quit()                                         # close Word

将您的 EMF 图像放入子文件夹 pictures 并运行此脚本。之后，您将进入当前文件夹中的文件 pictures.docx，其中包含所有这些 EMF 图像。

【讨论】：

你太棒了！非常感谢。这工作正常。让我尝试一下 kiwiwings 提供的解决方案。他给出了一个Java项目。将在关闭此问题之前尝试是否可行。请记住，win32com 仅适用于 Windows。会的，谢谢您使用 win32com 的解决方案似乎满足了我们的要求。将接受作为我问题的答案。非常感谢您的宝贵时间和您提供的宝贵建议。【参考方案3】：

SVG 可以直接添加到 Word - 只需在 Word (2016) 中手动尝试。我已经为您的用例创建了一个 example Java project 作为 POC。无需调用 inkscape，因为备用 PNG 是通过 Batik 即时创建的。

当然，OP 要求提供 Python 解决方案 - 但如果 python-openxml 缺少某些功能，则可能需要付出更多努力使其通过 python 运行而不是调用 java 运行时.

关于通过 EMF 的解决方案 - 请注意，确定边界有多种方法 - 在我在 POI 中实现的 EMF 渲染器中，我默认扫描 Window 和 Viewport 记录并仅使用 EMF如果我找不到任何其他内容或者如果通过配置选项省略了扫描，则标头边界。这通常会给我带来更好的结果。

示例项目的相关代码片段如下：

public class AddSvgToDocument 
    public static void main(String[] args) throws IOException, InvalidFormatException 
        File tmplDocx = new File(args[0]);
        File svgFile = new File(args[1]);
        File outDocx = new File(args[2]);

        try (FileInputStream fis = new FileInputStream(tmplDocx);
             XWPFDocument doc = new XWPFDocument(fis)) 

            SVGImageRenderer rnd = new SVGImageRenderer();
            try (FileInputStream fis2 = new FileInputStream(svgFile)) 
                rnd.loadImage(fis2, PictureData.PictureType.SVG.contentType);
            

            Rectangle2D nativeDim = rnd.getNativeBounds();
            double widthPx = 500;
            double heightPx = widthPx * nativeDim.getHeight() / nativeDim.getWidth();

            BufferedImage bi = rnd.getImage(new Dimension2DDouble(widthPx, heightPx));
            ByteArrayOutputStream bos = new ByteArrayOutputStream(100_000);
            ImageIO.write(bi, "PNG", bos);

            XWPFRun run = doc.createParagraph().createRun();

            int widthEmu = Units.pixelToEMU((int)widthPx);
            int heightEmu = Units.pixelToEMU((int)heightPx);
            XWPFPicture pic = run.addPicture(new ByteArrayInputStream(bos.toByteArray()), PictureData.PictureType.PNG.ooxmlId, "image.png", widthEmu, heightEmu);
            CTOfficeArtExtensionList extLst = pic.getCTPicture().getBlipFill().getBlip().addNewExtLst();
            addExt(extLst, "28A0092B-C50C-407E-A947-70E740481C1C"
                , "http://schemas.microsoft.com/office/drawing/2010/main", "a14:useLocalDpi"
                , "val", "0");

            addExt(extLst, "96DAC541-7B7A-43D3-8B79-37D633B846F1"
                , "http://schemas.microsoft.com/office/drawing/2016/SVG/main", "asvg:svgBlip"
                , "r:embed", addSVG(doc, svgFile));

            try (FileOutputStream fos = new FileOutputStream(outDocx)) 
                doc.write(fos);
            
        
    



    private static void addExt(CTOfficeArtExtensionList extLst, String uri, String namespace, String name, String attribute, String value) 
        CTOfficeArtExtension ext = extLst.addNewExt();
        ext.setUri(uri);
        XmlCursor cur = ext.newCursor();
        cur.toEndToken();
        String[] prefixName = name.split(":");
        cur.beginElement(new QName(namespace, prefixName[1], prefixName[0]));
        cur.insertNamespace(prefixName[0], namespace);
        if (attribute.contains(":")) 
            prefixName = attribute.split(":");
            String prefix = prefixName[0];
            String attrNamespace = DEFAULT_XML_OPTIONS
                .getSaveSuggestedPrefixes().entrySet().stream()
                .filter(me -> prefix.equals(me.getValue()))
                .map(Map.Entry::getKey)
                .findFirst().orElse(null);
            cur.insertAttributeWithValue(new QName(attrNamespace, prefixName[1], prefix), value);
         else 
            cur.insertAttributeWithValue(attribute, value);
        
        cur.dispose();
    

    private static String addSVG(XWPFDocument doc, File svgFile) throws InvalidFormatException, IOException 
        // SVG is not thoroughly supported as of POI 5.0.0, hence we need to go the long way instead of adding a picture
        OPCPackage pkg = doc.getPackage();
        String svgNameTmpl = "/word/media/image#.svg";
        int svgImageIdx = pkg.getUnusedPartIndex(svgNameTmpl);
        PackagePartName svgPPName = PackagingURIHelper.createPartName(svgNameTmpl.replace("#", Integer.toString(svgImageIdx)));
        PackagePart svgPart = pkg.createPart(svgPPName, PictureData.PictureType.SVG.contentType);

        try (FileInputStream fis = new FileInputStream(svgFile);
             OutputStream os = svgPart.getOutputStream()) 
            IOUtils.copy(fis, os);
        
        PackageRelationship svgRel = doc.getPackagePart().addRelationship(svgPPName, TargetMode.INTERNAL, IMAGE_PART);
        return svgRel.getId();

【讨论】：

感谢您的回复。已将项目克隆到我的 Eclipse 中。请注意，我不是活跃的开发者。你能告诉我运行这个项目的步骤吗？如 github 自述文件中所述，您需要安装 gradle，然后使用 gradle distZip 创建一个包含所有依赖项/jar 的 zip 文件。还有shell脚本来显示程序的调用。感谢您的宝贵时间。现在，我可以使用 Yuri 提出的建议来实现我的目标 - 使用 win32com。暂时会坚持下去。如果我们这边有任何变化，会与您联系。

以上是关于无法使用 Python 将 EMF 插入 Word的主要内容，如果未能解决你的问题，请参考以下文章

如何把EMF格式的图片转换成JPEG格式的

将插入的 EMF 文件解组到 Powerpoint 时出现失真

Word获取SCI文献中高清EMF格式图片的方法

怎样将word中的图片插入到CSDN博客中

使用word插入加密无法编辑的对象