ruby 雅虎JAPANが提供するテキスト解析WebAPIを利用して,テキストに振り仮名をる
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ruby 雅虎JAPANが提供するテキスト解析WebAPIを利用して,テキストに振り仮名をる相关的知识,希望对你有一定的参考价值。
<?xml version="1.0" encoding="UTF-8"?>
<ResultSet xmlns="urn:yahoo:jp:jlp:FuriganaService" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:yahoo:jp:jlp:FuriganaService http://jlp.yahooapis.jp/FuriganaService/V1/furigana.xsd">
<Result>
<WordList>
<Word>
<Surface>ルビ</Surface>
<Furigana>るび</Furigana>
<Roman>rubi</Roman>
</Word>
<Word>
<Surface>振り</Surface>
<Furigana>ぶり</Furigana>
<Roman>buri</Roman>
<SubWordList>
<SubWord>
<Surface>振</Surface>
<Furigana>ぶ</Furigana>
<Roman>bu</Roman>
</SubWord>
<SubWord>
<Surface>り</Surface>
<Furigana>り</Furigana>
<Roman>ri</Roman>
</SubWord>
</SubWordList>
</Word>
<Word>
<Surface>:</Surface>
</Word>
<Word>
<Surface>漢字</Surface>
<Furigana>かんじ</Furigana>
<Roman>kanzi</Roman>
</Word>
<Word>
<Surface>かな交じり</Surface>
<Furigana>かなまじり</Furigana>
<Roman>kanamaziri</Roman>
<SubWordList>
<SubWord>
<Surface>かな</Surface>
<Furigana>かな</Furigana>
<Roman>kana</Roman>
</SubWord>
<SubWord>
<Surface>交</Surface>
<Furigana>ま</Furigana>
<Roman>ma</Roman>
</SubWord>
<SubWord>
<Surface>じり</Surface>
<Furigana>じり</Furigana>
<Roman>ziri</Roman>
</SubWord>
</SubWordList>
</Word>
<Word>
<Surface>文</Surface>
<Furigana>ぶん</Furigana>
<Roman>bun</Roman>
</Word>
<Word>
<Surface>に</Surface>
<Furigana>に</Furigana>
<Roman>ni</Roman>
</Word>
<Word>
<Surface>、</Surface>
</Word>
<Word>
<Surface>ひらがな</Surface>
<Furigana>ひらがな</Furigana>
<Roman>hiragana</Roman>
</Word>
<Word>
<Surface>と</Surface>
<Furigana>と</Furigana>
<Roman>to</Roman>
</Word>
<Word>
<Surface>ローマ字</Surface>
<Furigana>ろーまじ</Furigana>
<Roman>ro-mazi</Roman>
<SubWordList>
<SubWord>
<Surface>ローマ</Surface>
<Furigana>ろーま</Furigana>
<Roman>ro-ma</Roman>
</SubWord>
<SubWord>
<Surface>字</Surface>
<Furigana>じ</Furigana>
<Roman>zi</Roman>
</SubWord>
</SubWordList>
</Word>
<Word>
<Surface>の</Surface>
<Furigana>の</Furigana>
<Roman>no</Roman>
</Word>
<Word>
<Surface>ふりがな</Surface>
<Furigana>ふりがな</Furigana>
<Roman>hurigana</Roman>
</Word>
<Word>
<Surface>(</Surface>
</Word>
<Word>
<Surface>ルビ</Surface>
<Furigana>るび</Furigana>
<Roman>rubi</Roman>
</Word>
<Word>
<Surface>)</Surface>
</Word>
<Word>
<Surface>を</Surface>
<Furigana>を</Furigana>
<Roman>wo</Roman>
</Word>
<Word>
<Surface>付け</Surface>
<Furigana>つけ</Furigana>
<Roman>tuke</Roman>
<SubWordList>
<SubWord>
<Surface>付</Surface>
<Furigana>つ</Furigana>
<Roman>tu</Roman>
</SubWord>
<SubWord>
<Surface>け</Surface>
<Furigana>け</Furigana>
<Roman>ke</Roman>
</SubWord>
</SubWordList>
</Word>
<Word>
<Surface>ます</Surface>
<Furigana>ます</Furigana>
<Roman>masu</Roman>
</Word>
<Word>
<Surface>。</Surface>
</Word>
</WordList>
</Result>
</ResultSet>
require 'pp'
require 'mechanize'
# Yahoo! 振り仮名API
# Ref: http://developer.yahoo.co.jp/webapi/jlp/furigana/v1/furigana.html
class Furigana
attr_accessor :sentence
def initialize appid=''
@appid = appid
@a = Mechanize.new
end
def send_request
# Request XML Data with GET method
@xml = @a.get(request_url).xml
# Construct [[Surface, Furigana, Roman], ...] array
word.inject([]){|res, elements|
res << %w(Surface Furigana Roman).inject([]){|r, target_node|
r << elements.search(target_node).text
}
}
end
def word
# Extract <Word/> nodes
xml = @xml.search('Word')
# remove <SubWord/> nodes
xml.search('SubWord').map{|element|
element.children.remove
}
xml
end
def surface
@xml.search('Surface')
.map{|element|
element.text
}
end
def furigana
@xml.search('Furigana')
.map{|element|
element.text
}
end
def sentence_is str
@sentence = str
self
end
private
def request_url
"#{base_url}#{service_name}?#{appid}&#{sentence}"
end
def service_name
service = 'furigana'
end
def appid
"appid=#{@appid}"
end
def base_url
base_url = 'http://jlp.yahooapis.jp/FuriganaService/V1/'
end
def sentence
"sentence=#{@sentence}"
end
alias_method :and_request_send, :send_request
end
# Usage:
appid = 'LongLongYOUREappIDfooabcdBRAbrabrafoobarBrabRabrabrabra-'
f = Furigana.new(appid)
#f.sentence = 'ルビ振り:漢字かな交じり文に、ひらがなとローマ字のふりがな(ルビ)を付けます。'
#pp res = f.send_request
pp f.sentence_is('ルビ振り:漢字かな交じり文に、ひらがなとローマ字のふりがな(ルビ)を付けます。').and_request_send
=begin
[["ルビ", "るび", "rubi"],
["振り", "ぶり", "buri"],
[":", "", ""],
["漢字", "かんじ", "kanzi"],
["かな交じり", "かなまじり", "kanamaziri"],
["文", "ぶん", "bun"],
["に", "に", "ni"],
["、", "", ""],
["ひらがな", "ひらがな", "hiragana"],
["と", "と", "to"],
["ローマ字", "ろーまじ", "ro-mazi"],
["の", "の", "no"],
["ふりがな", "ふりがな", "hurigana"],
["(", "", ""],
["ルビ", "るび", "rubi"],
[")", "", ""],
["を", "を", "wo"],
["付け", "つけ", "tuke"],
["ます", "ます", "masu"],
["。", "", ""]]
=end
以上是关于ruby 雅虎JAPANが提供するテキスト解析WebAPIを利用して,テキストに振り仮名をる的主要内容,如果未能解决你的问题,请参考以下文章
ruby EVALメソッドは実行中のコンテキストを共有するが,変数の宣言はできない
python 蟒蛇はリスト内包表记の中で正规表现が使えるとのこと。テキスト处理に使えそう。