text 将WoneF数据库文件转换为Solr格式的同义词文件

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了text 将WoneF数据库文件转换为Solr格式的同义词文件相关的知识,希望对你有一定的参考价值。

# -*- coding: utf-8 -*-
import xml.etree.ElementTree as ET
import re
import unicodedata

# This method adds all given synonyms into the correct dictionary entry.
def extendDictEntry(dict, key, xmlSynonyms):
   for child in xmlSynonyms:
       childText = child.text.encode('utf-8')

       if (childText not in dict[key]):
           dict[key].extend([childText])
   return dict

# This method buils the synonyms dictionary from the WoNeF file.
def buildSynonymDictionary():
    tree = ET.parse('wonef-fscore-0.1.xml')

    root = tree.getroot()
    dict = {}

    # fill synonyms dictionary
    for synset in root:
        for child in synset:
            if child.tag == "SYNONYM":
                for literal in child:
                    currLiteralText = literal.text.encode('utf-8')

                    if currLiteralText in dict:
                        # add all SYNONYM tags text into the correct entry of the map
                        extendDictEntry(dict, currLiteralText, child)
                    else:
                        # create a new entry in the map
                        dict[currLiteralText] = [currLiteralText]
                        extendDictEntry(dict, currLiteralText, child)

    return dict

def removeAccents(str):
    return ''.join(c for c in unicodedata.normalize('NFD', str.decode('utf-8'))
                  if unicodedata.category(c) != 'Mn').encode('utf-8')

# This method writes the synonym file in the Solr format
def writeSolrSynonymFile():
    dict = buildSynonymDictionary()
    file = open("solr_synonym.txt","w")
    file.write("# Solr Synonmys File \n\n")

    for key in dict:
        try:
            file.write(
                removeAccents(key) +
                " => " +
                removeAccents(", ".join(dict[key])) +
                "\n")
        except UnicodeEncodeError:
            print("UnicodeEncodeError: " + key + " - " + ", ".join(dict[key]))

    file.close()


writeSolrSynonymFile()

以上是关于text 将WoneF数据库文件转换为Solr格式的同义词文件的主要内容,如果未能解决你的问题,请参考以下文章

如何将字典格式的txt文件转换为python中的数据框?

将 java.hprof.text 转换为二进制 hprof 格式的方法?

SwiftUI Text - 如何将 10^12 转换为 10 的正确格式 [重复]

如何在 Apache SolR 中索引 pdf/word 文档

如何将TXT文件保存为HTML

将一个字段转换为Solr中的多个字段