找到最接近特定字符串的单词?

Posted

技术标签:

【中文标题】找到最接近特定字符串的单词?【英文标题】:Find the word which I closest to the particular string? 【发布时间】:2013-10-08 01:54:51 【问题描述】:

我们做用户核对报告,因为我们需要找到分配给特定用户的电子邮件 ID。

前任

客户报告用户名可能如下所示

Sathish K
Sathya A

但在我们的合并报告中,实际用户名将如下所示

Sathish Kothandam
Sathya Arjun

所以我创建了一个宏

Sub test
Dim t as string 
t= “Sathish K”
msgbox(getemailId(t))
End sub

    Dim rng As Range

Function getemailId(Byval findString As String)
    With ActiveWorkbook.Sheets("CONSOLIDATED").Range("B:B")
        Set rng = .find(What:=findString, LookIn:=xlValues)
        If Not rng Is Nothing Then
‘ B – Column contains username C – Email id of the user
            getemailId = rng.offset(0,1).value
        Else
            find1 = 0
        End If
    End With
End Function

我的宏在上述场景中完美运行,但有时我可能会收到如下用户名

Satish Kothandam
Sathiya Arjun

但这一次它返回 0 。有什么办法可以实现我的目标吗? 希望我解释清楚?

【问题讨论】:

如果你可以把数据放在ms访问表中,你可以使用SOUNDEX。看看这个link 这个link 用于 Excel Soundex。 嗨桑托什。谢谢你的建议。但是 excel soundex 的链接仅适用于几个单词。不是所有的..我已经从那个网站下载了Example excel工作簿并检查了。 ? 我测试过,它对我有用。你能给我一个它不起作用的例子吗? 作为我上面的例子 Sathish-Satish,sathya-satya 【参考方案1】:

请看下面的示例代码。

Sub test()

Dim str1 As String, str2 As String
Dim str1c As String, str2c As String

str1 = "Sathish"
str2 = "Satish"

str1c = SOUNDEX(str1)
str2c = SOUNDEX(str2)

MsgBox str1c = str2c

End Sub

Function SOUNDEX(Surname As String) As String
' Developed by Richard J. Yanco
' This function follows the Soundex rules given at
' http://home.utah-inter.net/kinsearch/Soundex.html

    Dim Result As String, c As String * 1
    Dim Location As Integer

    Surname = UCase(Surname)

'   First character must be a letter
    If Asc(Left(Surname, 1)) < 65 Or Asc(Left(Surname, 1)) > 90 Then
        SOUNDEX = ""
        Exit Function
    Else
'       St. is converted to Saint
        If Left(Surname, 3) = "ST." Then
            Surname = "SAINT" & Mid(Surname, 4)
        End If

'       Convert to Soundex: letters to their appropriate digit,
'                     A,E,I,O,U,Y ("slash letters") to slashes
'                     H,W, and everything else to zero-length string

        Result = Left(Surname, 1)
        For Location = 2 To Len(Surname)
            Result = Result & Category(Mid(Surname, Location, 1))
        Next Location

'       Remove double letters
        Location = 2
        Do While Location < Len(Result)
            If Mid(Result, Location, 1) = Mid(Result, Location + 1, 1) Then
                Result = Left(Result, Location) & Mid(Result, Location + 2)
            Else
                Location = Location + 1
            End If
        Loop

'       If category of 1st letter equals 2nd character, remove 2nd character
        If Category(Left(Result, 1)) = Mid(Result, 2, 1) Then
            Result = Left(Result, 1) & Mid(Result, 3)
        End If

'       Remove slashes
        For Location = 2 To Len(Result)
            If Mid(Result, Location, 1) = "/" Then
                Result = Left(Result, Location - 1) & Mid(Result, Location + 1)
            End If
        Next

'       Trim or pad with zeroes as necessary
        Select Case Len(Result)
            Case 4
                SOUNDEX = Result
            Case Is < 4
                SOUNDEX = Result & String(4 - Len(Result), "0")
            Case Is > 4
                SOUNDEX = Left(Result, 4)
        End Select
    End If
End Function

Private Function Category(c) As String
'   Returns a Soundex code for a letter
    Select Case True
        Case c Like "[AEIOUY]"
            Category = "/"
        Case c Like "[BPFV]"
            Category = "1"
        Case c Like "[CSKGJQXZ]"
            Category = "2"
        Case c Like "[DT]"
            Category = "3"
        Case c = "L"
            Category = "4"
        Case c Like "[MN]"
            Category = "5"
        Case c = "R"
            Category = "6"
        Case Else 'This includes H and W, spaces, punctuation, etc.
            Category = ""
    End Select
End Function

【讨论】:

感谢这个 santosh .. 这就是我想要的。抱歉最初误解了这个功能:)【参考方案2】:

您可以使用 levenshtein 算法。它计算两个字符串之间的距离。

来源维基媒体

Function levenshtein(a As String, b As String) As Integer

    Dim i As Integer
    Dim j As Integer
    Dim cost As Integer
    Dim d() As Integer
    Dim min1 As Integer
    Dim min2 As Integer
    Dim min3 As Integer

    If Len(a) = 0 Then
        levenshtein = Len(b)
        Exit Function
    End If

    If Len(b) = 0 Then
        levenshtein = Len(a)
        Exit Function
    End If

    ReDim d(Len(a), Len(b))

    For i = 0 To Len(a)
        d(i, 0) = i
    Next

    For j = 0 To Len(b)
        d(0, j) = j
    Next

    For i = 1 To Len(a)
        For j = 1 To Len(b)
            If Mid(a, i, 1) = Mid(b, j, 1) Then
                cost = 0
            Else
                cost = 1
            End If

            ' Since Min() function is not a part of VBA, we'll "emulate" it below
            min1 = (d(i - 1, j) + 1)
            min2 = (d(i, j - 1) + 1)
            min3 = (d(i - 1, j - 1) + cost)

'            If min1 <= min2 And min1 <= min3 Then
'                d(i, j) = min1
'            ElseIf min2 <= min1 And min2 <= min3 Then
'                d(i, j) = min2
'            Else
'                d(i, j) = min3
'            End If
'            In Excel we can use Min() function that is included
'            as a method of WorksheetFunction object
            d(i, j) = Application.WorksheetFunction.Min(min1, min2, min3)
        Next
    Next
    levenshtein = d(Len(a), Len(b))

End Function

【讨论】:

以上是关于找到最接近特定字符串的单词?的主要内容,如果未能解决你的问题,请参考以下文章

正则表达式匹配在任意位置连字符并分成两行的特定单词

如何使用 word2vec 找到最接近向量的单词

如何匹配具有多个相似字符串的字符串以找到最接近的匹配项[关闭]

Python:找到最接近的字符串(从列表中)到另一个字符串

在字符串中查找特定单词的位置

在 C++ 中找到小于特定整数值的最接近的浮点值?