


【中文标题】有没有办法将数字单词转换为整数?【英文标题】:Is there a way to convert number words to Integers? 【发布时间】:2010-10-04 08:06:35 【问题描述】:

我需要将one 转换为1two 转换为2 等等。



另见:***.com/questions/70161/… 也许这会有所帮助:pastebin.com/WwFCjYtt 如果有人还在寻找这个问题的答案,我从以下所有答案中获得灵感并创建了一个 python 包:github.com/careless25/text2digits 我已经使用下面的例子来开发和扩展这个过程,但是变成了西班牙语,以备将来参考:github.com/elbaulp/text2digits_es 任何到达这里的人都不是在寻找 Python 解决方案,这是并行的 C# 问题:Convert words (string) to Int,这是 Java 一:Converting Words to Numbers in Java 【参考方案1】:

这段代码大部分是设置numwords dict,它只在第一次调用时完成。

def text2int(textnum, numwords=):
    if not numwords:
      units = [
        "zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
        "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
        "sixteen", "seventeen", "eighteen", "nineteen",

      tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]

      scales = ["hundred", "thousand", "million", "billion", "trillion"]

      numwords["and"] = (1, 0)
      for idx, word in enumerate(units):    numwords[word] = (1, idx)
      for idx, word in enumerate(tens):     numwords[word] = (1, idx * 10)
      for idx, word in enumerate(scales):   numwords[word] = (10 ** (idx * 3 or 2), 0)

    current = result = 0
    for word in textnum.split():
        if word not in numwords:
          raise Exception("Illegal word: " + word)

        scale, increment = numwords[word]
        current = current * scale + increment
        if scale > 100:
            result += current
            current = 0

    return result + current

print text2int("seven billion one hundred million thirty one thousand three hundred thirty seven")


仅供参考,这不适用于日期。试试看:print text2int("nineteen ninety six") # 115 1996 的正确写法是“一千九百九十六”。如果你想支持年,你需要不同的代码。 Marc Burns 的 ruby gem 可以做到这一点。我最近分叉了它以增加多年的支持。您可以拨打ruby code from python。 “一百零六”尝试会中断。 print(text2int("hundred and Six")) .. 还有 print(text2int("thousand")) “预期的结果”。我想不同的用户有不同的期望。就个人而言,我的是不会使用该输入调用它,因为它不是有效数字。是两个。【参考方案2】:

我刚刚为 PyPI 发布了一个名为 word2number 的 Python 模块,用于确切用途。 https://github.com/akshaynagpal/w2n


pip install word2number

确保您的 pip 已更新到最新版本。


from word2number import w2n

print w2n.word_to_num("two million three thousand nine hundred and eighty four")


试过你的包。建议处理如下字符串:"1 million""1M"。 w2n.word_to_num("100万") 抛出错误。 @Ray 感谢您试用。您能否在github.com/akshaynagpal/w2n/issues 提出问题。如果你愿意,你也可以贡献。否则,我一定会在下一个版本中研究这个问题。再次感谢! 罗伯特,开源软件就是人们合作改进它。我想要一个图书馆,并且看到人们也想要一个。所以做到了。它可能还没有为生产级系统做好准备或不符合教科书的流行语。但是,它可以达到目的。此外,如果您可以提交 PR 以便为所有用户进一步改进,那就太好了。 它会计算吗?说:百分之十九五十七?或任何其他运算符,即 +、6、* 和 / 目前还没有@S.Jackson。【参考方案3】:


def text2int (textnum, numwords=):
    if not numwords:
        units = [
        "zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
        "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
        "sixteen", "seventeen", "eighteen", "nineteen",

        tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]

        scales = ["hundred", "thousand", "million", "billion", "trillion"]

        numwords["and"] = (1, 0)
        for idx, word in enumerate(units):  numwords[word] = (1, idx)
        for idx, word in enumerate(tens):       numwords[word] = (1, idx * 10)
        for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)

    ordinal_words = 'first':1, 'second':2, 'third':3, 'fifth':5, 'eighth':8, 'ninth':9, 'twelfth':12
    ordinal_endings = [('ieth', 'y'), ('th', '')]

    textnum = textnum.replace('-', ' ')

    current = result = 0
    curstring = ""
    onnumber = False
    for word in textnum.split():
        if word in ordinal_words:
            scale, increment = (1, ordinal_words[word])
            current = current * scale + increment
            if scale > 100:
                result += current
                current = 0
            onnumber = True
            for ending, replacement in ordinal_endings:
                if word.endswith(ending):
                    word = "%s%s" % (word[:-len(ending)], replacement)

            if word not in numwords:
                if onnumber:
                    curstring += repr(result + current) + " "
                curstring += word + " "
                result = current = 0
                onnumber = False
                scale, increment = numwords[word]

                current = current * scale + increment
                if scale > 100:
                    result += current
                    current = 0
                onnumber = True

    if onnumber:
        curstring += repr(result + current)

    return curstring


 >>> text2int("I want fifty five hot dogs for two hundred dollars.")
 I want 55 hot dogs for 200 dollars.

如果您有“200 美元”,则可能会出现问题。但是,这真的很粗糙。


我从这里获取了这个和其他代码 sn-ps 并把它变成了一个 python 库:github.com/careless25/text2digits【参考方案4】:

我需要一些不同的东西,因为我的输入来自语音到文本的转换,而解决方案并不总是对数字求和。例如,“我的邮政编码是一二三四五”不应转换为“我的邮政编码是 15”。

我采用了 Andrew 的 answer 并对其进行了调整,以处理人们突出显示为错误的其他一些情况,并添加了对我上面提到的邮政编码等示例的支持。下面展示了一些基本的测试用例,但我相信还有改进的空间。

def is_number(x):
    if type(x) == str:
        x = x.replace(',', '')
        return False
    return True

def text2int (textnum, numwords=):
    units = [
        'zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight',
        'nine', 'ten', 'eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen',
        'sixteen', 'seventeen', 'eighteen', 'nineteen',
    tens = ['', '', 'twenty', 'thirty', 'forty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninety']
    scales = ['hundred', 'thousand', 'million', 'billion', 'trillion']
    ordinal_words = 'first':1, 'second':2, 'third':3, 'fifth':5, 'eighth':8, 'ninth':9, 'twelfth':12
    ordinal_endings = [('ieth', 'y'), ('th', '')]

    if not numwords:
        numwords['and'] = (1, 0)
        for idx, word in enumerate(units): numwords[word] = (1, idx)
        for idx, word in enumerate(tens): numwords[word] = (1, idx * 10)
        for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)

    textnum = textnum.replace('-', ' ')

    current = result = 0
    curstring = ''
    onnumber = False
    lastunit = False
    lastscale = False

    def is_numword(x):
        if is_number(x):
            return True
        if word in numwords:
            return True
        return False

    def from_numword(x):
        if is_number(x):
            scale = 0
            increment = int(x.replace(',', ''))
            return scale, increment
        return numwords[x]

    for word in textnum.split():
        if word in ordinal_words:
            scale, increment = (1, ordinal_words[word])
            current = current * scale + increment
            if scale > 100:
                result += current
                current = 0
            onnumber = True
            lastunit = False
            lastscale = False
            for ending, replacement in ordinal_endings:
                if word.endswith(ending):
                    word = "%s%s" % (word[:-len(ending)], replacement)

            if (not is_numword(word)) or (word == 'and' and not lastscale):
                if onnumber:
                    # Flush the current number we are building
                    curstring += repr(result + current) + " "
                curstring += word + " "
                result = current = 0
                onnumber = False
                lastunit = False
                lastscale = False
                scale, increment = from_numword(word)
                onnumber = True

                if lastunit and (word not in scales):                                                                                                                                                                                                                                         
                    # Assume this is part of a string of individual numbers to                                                                                                                                                                                                                
                    # be flushed, such as a zipcode "one two three four five"                                                                                                                                                                                                                 
                    curstring += repr(result + current)                                                                                                                                                                                                                                       
                    result = current = 0                                                                                                                                                                                                                                                      

                if scale > 1:                                                                                                                                                                                                                                                                 
                    current = max(1, current)                                                                                                                                                                                                                                                 

                current = current * scale + increment                                                                                                                                                                                                                                         
                if scale > 100:                                                                                                                                                                                                                                                               
                    result += current                                                                                                                                                                                                                                                         
                    current = 0                                                                                                                                                                                                                                                               

                lastscale = False                                                                                                                                                                                                              
                lastunit = False                                                                                                                                                
                if word in scales:                                                                                                                                                                                                             
                    lastscale = True                                                                                                                                                                                                         
                elif word in units:                                                                                                                                                                                                             
                    lastunit = True

    if onnumber:
        curstring += repr(result + current)

    return curstring


one two three -> 123
three forty five -> 345
three and forty five -> 3 and 45
three hundred and forty five -> 345
three hundred -> 300
twenty five hundred -> 2500
three thousand and six -> 3006
three thousand six -> 3006
nineteenth -> 19
twentieth -> 20
first -> 1
my zip is one two three four five -> my zip is 12345
nineteen ninety six -> 1996
fifty-seventh -> 57
one million -> 1000000
first hundred -> 100
I will buy the first thousand -> I will buy the 1000  # probably should leave ordinal in the string
thousand -> 1000
hundred and six -> 106
1 million -> 1000000


我接受了您的回答并修复了一些错误。增加了对“二十”的支持 -> 2010 和一般的所有十。你可以在这里找到它:github.com/careless25/text2digits 它会计算吗?说:百分之十九五十七?或任何其他运算符,即 +、6、* 和 / @S.Jackson 它不进行计算。如果您的文本 sn-p 是 python 中的一个有效方程式,我想您可以使用它首先转换为整数,然后 eval 结果(假设您熟悉并且对安全问题感到满意)。所以“10 + 5”变成“10 + 5”,然后eval("10 + 5") 给你 15。但这只会处理最简单的情况。没有浮动,括号控制顺序,支持在语音到文本中说加/减/等。【参考方案5】:


def text2int(textnum, numwords=):
    if not numwords:
        units = [
        "zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
        "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
        "sixteen", "seventeen", "eighteen", "nineteen",

        tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]

        scales = ["hundred", "thousand", "million", "billion", "trillion"]

        numwords["and"] = (1, 0)
        for idx, word in enumerate(units):  numwords[word] = (1, idx)
        for idx, word in enumerate(tens):       numwords[word] = (1, idx * 10)
        for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)

    ordinal_words = 'first':1, 'second':2, 'third':3, 'fifth':5, 'eighth':8, 'ninth':9, 'twelfth':12
    ordinal_endings = [('ieth', 'y'), ('th', '')]

    textnum = textnum.replace('-', ' ')

    current = result = 0
    for word in textnum.split():
        if word in ordinal_words:
            scale, increment = (1, ordinal_words[word])
            for ending, replacement in ordinal_endings:
                if word.endswith(ending):
                    word = "%s%s" % (word[:-len(ending)], replacement)

            if word not in numwords:
                raise Exception("Illegal word: " + word)

            scale, increment = numwords[word]
         current = current * scale + increment
         if scale > 100:
            result += current
            current = 0

    return result + current`


注意:hundredththousandth 等返回零。使用one hundredth 获取100 可变的默认参数是反模式【参考方案6】:


>>> number = 'one':1,
...           'two':2,
...           'three':3,
>>> number['two']






for i in range(10):
   myDict[30 + i] = "thirty-" + singleDigitsDict[i]

如果您需要更广泛的内容,那么您似乎需要自然语言处理工具。 This article 可能是一个很好的起点。



使用 Python 包:WordToDigits

pip install wordtodigits

它可以在句子中找到以单词形式出现的数字,然后将它们转换为正确的数字格式。如果存在小数部分,还需要处理。 数字的单词表示可以在文章中的任何地方


def parse_int(string):
    ONES = 'zero': 0,
            'one': 1,
            'two': 2,
            'three': 3,
            'four': 4,
            'five': 5,
            'six': 6,
            'seven': 7,
            'eight': 8,
            'nine': 9,
            'ten': 10,
            'eleven': 11,
            'twelve': 12,
            'thirteen': 13,
            'fourteen': 14,
            'fifteen': 15,
            'sixteen': 16,
            'seventeen': 17,
            'eighteen': 18,
            'nineteen': 19,
            'twenty': 20,
            'thirty': 30,
            'forty': 40,
            'fifty': 50,
            'sixty': 60,
            'seventy': 70,
            'eighty': 80,
            'ninety': 90,

    numbers = []
    for token in string.replace('-', ' ').split(' '):
        if token in ONES:
        elif token == 'hundred':
            numbers[-1] *= 100
        elif token == 'thousand':
            numbers = [x * 1000 for x in numbers]
        elif token == 'million':
            numbers = [x * 1000000 for x in numbers]
    return sum(numbers)

用 700 个 1 到 100 万范围内的随机数测试效果很好。



进行更改,以便 text2int(scale) 将返回正确的转换。例如,text2int("hundred") => 100。

import re

numwords = 

def text2int(textnum):

    if not numwords:

        units = [ "zero", "one", "two", "three", "four", "five", "six",
                "seven", "eight", "nine", "ten", "eleven", "twelve",
                "thirteen", "fourteen", "fifteen", "sixteen", "seventeen",
                "eighteen", "nineteen"]

        tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", 
                "seventy", "eighty", "ninety"]

        scales = ["hundred", "thousand", "million", "billion", "trillion", 
                'quadrillion', 'quintillion', 'sexillion', 'septillion', 
                'octillion', 'nonillion', 'decillion' ]

        numwords["and"] = (1, 0)
        for idx, word in enumerate(units): numwords[word] = (1, idx)
        for idx, word in enumerate(tens): numwords[word] = (1, idx * 10)
        for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)

    ordinal_words = 'first':1, 'second':2, 'third':3, 'fifth':5, 
            'eighth':8, 'ninth':9, 'twelfth':12
    ordinal_endings = [('ieth', 'y'), ('th', '')]
    current = result = 0
    tokens = re.split(r"[\s-]+", textnum)
    for word in tokens:
        if word in ordinal_words:
            scale, increment = (1, ordinal_words[word])
            for ending, replacement in ordinal_endings:
                if word.endswith(ending):
                    word = "%s%s" % (word[:-len(ending)], replacement)

            if word not in numwords:
                raise Exception("Illegal word: " + word)

            scale, increment = numwords[word]

        if scale > 1:
            current = max(1, current)

        current = current * scale + increment
        if scale > 100:
            result += current
            current = 0

    return result + current


我认为100的正确英文拼写是“一百”。 @recursive 你是绝对正确的,但是这段代码的优点是它可以处理“百分之一”(也许这就是 Dawa 试图强调的)。从描述的声音来看,其他类似的代码需要“百分之一”,这并不总是常用的术语(例如“她选择了要丢弃的第一百个项目”)【参考方案11】:

Marc Burns 的 ruby gem 可以做到这一点。我最近分叉了它以增加多年的支持。您可以拨打ruby code from python。

  require 'numbers_in_words'
  require 'numbers_in_words/duck_punch'

  nums = ["fifteen sixteen", "eighty five sixteen",  "nineteen ninety six",
          "one hundred and seventy nine", "thirteen hundred", "nine thousand two hundred and ninety seven"]
  nums.each |n| p n; p n.in_numbers

结果:"fifteen sixteen" 1516 "eighty five sixteen" 8516 "nineteen ninety six" 1996 "one hundred and seventy nine" 179 "thirteen hundred" 1300 "nine thousand two hundred and ninety seven" 9297


请不要从 python 调用 ruby​​ 代码或从 ruby​​ 调用 python 代码。它们足够接近,这样的东西应该被移植过来。 同意,但在移植之前,调用 ruby​​ 代码总比没有好。 它不是很复杂,@recursive 下面提供了可以使用的逻辑(几行代码)。 实际上在我看来“十五十六”是错的? @yekta 对,我认为递归的答案在 SO 答案的范围内是好的。但是,gem 提供了一个包含测试和其他功能的完整包。无论如何,我认为两者都有自己的位置。【参考方案12】:

一个快速的解决方案是使用inflect.py 生成字典进行翻译。

inflect.py 有一个number_to_words() 函数,它将一个数字(例如2)转换成它的单词形式(例如'two')。不幸的是,没有提供它的反向(这将允许您避免使用翻译词典路线)。同样,您可以使用该功能来构建翻译词典:

>>> import inflect
>>> p = inflect.engine()
>>> word_to_number_mapping = 
>>> for i in range(1, 100):
...     word_form = p.number_to_words(i)  # 1 -> 'one'
...     word_to_number_mapping[word_form] = i
>>> print word_to_number_mapping['one']
>>> print word_to_number_mapping['eleven']
>>> print word_to_number_mapping['forty-three']

如果您愿意花一些时间,可能会检查 inflect.py 的 number_to_words() 函数的内部工作原理并构建您自己的代码以动态执行此操作(我没有尝试这样做) .



我采用了@recursive 的logic 并转换为Ruby。我还对查找表进行了硬编码,因此它并不那么酷,但可能有助于新手了解正在发生的事情。

WORDNUMS = "zero"=> [1,0], "one"=> [1,1], "two"=> [1,2], "three"=> [1,3],
            "four"=> [1,4], "five"=> [1,5], "six"=> [1,6], "seven"=> [1,7], 
            "eight"=> [1,8], "nine"=> [1,9], "ten"=> [1,10], 
            "eleven"=> [1,11], "twelve"=> [1,12], "thirteen"=> [1,13], 
            "fourteen"=> [1,14], "fifteen"=> [1,15], "sixteen"=> [1,16], 
            "seventeen"=> [1,17], "eighteen"=> [1,18], "nineteen"=> [1,19], 
            "twenty"=> [1,20], "thirty" => [1,30], "forty" => [1,40], 
            "fifty" => [1,50], "sixty" => [1,60], "seventy" => [1,70], 
            "eighty" => [1,80], "ninety" => [1,90],
            "hundred" => [100,0], "thousand" => [1000,0], 
            "million" => [1000000, 0]

def text_2_int(string)
  numberWords = string.gsub('-', ' ').split(/ /) - %wand
  current = result = 0
  numberWords.each do |word|
    scale, increment = WORDNUMS[word]
    current = current * scale + increment
    if scale > 100
      result += current
      current = 0
  return result + current

我想处理像two thousand one hundred and forty-six这样的字符串




def words_to_number(words):
    numbers = "zero":0, "a":1, "half":0.5, "quarter":0.25, "one":1,"two":2,
               "three":3, "four":4,"five":5,"six":6,"seven":7,"eight":8,
               "nine":9, "ten":10,"eleven":11,"twelve":12, "thirteen":13,
               "fourteen":14, "fifteen":15,"sixteen":16,"seventeen":17,
               "eighteen":18,"nineteen":19, "twenty":20,"thirty":30, "forty":40,
               "fifty":50,"sixty":60,"seventy":70, "eighty":80,"ninety":90

    groups = "hundred":100, "thousand":1_000, 
              "lac":1_00_000, "lakh":1_00_000, 
              "million":1_000_000, "crore":10**7, 
              "billion":10**9, "trillion":10**12
    split_at = ["and", "plus"]
    n = 0
    skip = False
    words_array = words.split(" ")
    for i, word in enumerate(words_array):
        if not skip:
            if word in groups:
                n*= groups[word]
            elif word in numbers:
                n += numbers[word]
            elif word in split_at:
                skip = True
                remaining = ' '.join(words_array[i+1:])
                    n += float(word)
                except ValueError as e:
                    raise ValueError(f"Invalid word word") from e
    return n


print(words_to_number("a million and one"))
>> 1000001

print(words_to_number("one crore and one"))
>> 1000,0001

print(words_to_number("0.5 million one"))
>> 500001.0

print(words_to_number("half million and one hundred"))
>> 500100.0

>> 0.25

print(words_to_number("one hundred plus one"))
>> 101


我又做了一些测试,“一万七百”= 1700“一万七百”=1700 但“一千七百”=(一千七)百= 1007 * 100 = 100700。说“一千七百”而不是“一千七百”在技术上是错误的吗?!【参考方案15】:


import pandas as pd
mylist = pd.Series(['one','two','three'])
mylist1 = []
for x in range(len(mylist)):


w2n 是什么?它没有在任何地方定义【参考方案16】:

此代码仅适用于99以下的数字。word to int和int to word(其余需要实现10-20行代码和简单逻辑。这只是初学者的简单代码):

num = input("Enter the number you want to convert : ")
mydict = '1': 'One', '2': 'Two', '3': 'Three', '4': 'Four', '5': 'Five','6': 'Six', '7': 'Seven', '8': 'Eight', '9': 'Nine', '10': 'Ten','11': 'Eleven', '12': 'Twelve', '13': 'Thirteen', '14': 'Fourteen', '15': 'Fifteen', '16': 'Sixteen', '17': 'Seventeen', '18': 'Eighteen', '19': 'Nineteen'
mydict2 = ['', '', 'Twenty', 'Thirty', 'Fourty', 'fifty', 'sixty', 'Seventy', 'Eighty', 'Ninty']

if num.isdigit():
    if(int(num) < 20):
        print(" :---> " + mydict[num])
        var1 = int(num) % 10
        var2 = int(num) / 10
        print(" :---> " + mydict2[int(var2)] + mydict[str(var1)])
    num = num.lower()
    dict_w = 'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9, 'ten': 10, 'eleven': 11, 'twelve': 12, 'thirteen': 13, 'fourteen': 14, 'fifteen': 15, 'sixteen': 16, 'seventeen': '17', 'eighteen': '18', 'nineteen': '19'
    mydict2 = ['', '', 'twenty', 'thirty', 'fourty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninty']
    divide = num[num.find("ty")+2:]
    if num:
        if(num in dict_w.keys()):
            print(" :---> " + str(dict_w[num]))
        elif divide == '' :
            for i in range(0, len(mydict2)-1):
                if mydict2[i] == num:
                    print(" :---> " + str(i * 10))
        else :
            str3 = 0
            str1 = num[num.find("ty")+2:]
            str2 = num[:-len(str1)]
            for i in range(0, len(mydict2)):
                if mydict2[i] == str2:
                    str3 = i
            if str2 not in mydict2:
                print("----->Invalid Input<-----")                
                    print(" :---> " + str((str3*10) + dict_w[str1]))
                    print("----->Invalid Input<-----")
        print("----->Please Enter Input<-----")


请解释这段代码的作用,以及它是如何做到的。这样一来,对于那些还不太了解编码的人来说,您的答案更有价值。 如果用户将数字作为输入,程序将以单词返回,反之亦然,例如 5->5 和 5->5。程序适用于 100 以下的数字,但可以扩展到任何范围只需添加几行代码。





切换语句将单词转换为数字? C ++

转换号码 |整数到单词
