NSString 本地化比较：给定较长字符串的不一致结果

Posted 2023-02-27

技术标签:

【中文标题】NSString 本地化比较：给定较长字符串的不一致结果【英文标题】：NSString localizedCompare: inconsistent results given longer strings 【发布时间】：2017-06-28 18:07:35 【问题描述】：

我们正在尝试使用NSFetchedResultsController 来返回人名，并使用localizedCompare: 按排序顺序填充UITableView。我们还尝试在 UI 中提供一个部分索引（每个部分的第一个字符的右列）。我们在我们的实体上为NSFetchedResultsController 提供了一个选择器，该选择器提供了每个实体应该属于的部分（特别是人名的第一个字符，大写）。

在处理使用 Unicode 代码点的人名时，我们遇到了问题。 NSFetchedResultsController 抱怨实体没有按部分排序。

具体来说：

reason=The fetched object at index 103 has an out of order section name 'Ø. Objects must be sorted by section name', 
reason = "The fetched object at index 103 has an out of order section name '\U00d8. Objects must be sorted by section name'";

问题似乎是localizedCompare: 返回的比较值对于整个“单词”与前导字符不同。

以下测试通过，但我希望 ("Ø" and "O") 与 ("Østerhus" and "Osypowicz") 之间的比较结果一致。

- (void)testLocalizedSortOrder300

    NSString *str1 = @"Osowski";
    NSString *str2 = @"Østerhus";
    NSString *str3 = @"Osypowicz";

    NSString *letter1 = @"O";
    NSString *letter2 = @"Ø";

    //localizedCompare:

    //"Osowski" < "Østerhus"
    NSComparisonResult res = [str1 localizedCompare:str2];
    XCTAssertTrue(res == NSOrderedAscending, @"(localizedCompare:) Expected '%@' and '%@' to be NSOrderedAscending, but got %@", str1, str2, res == NSOrderedSame ? @"NSOrderedSame" : @"NSOrderedDescending");

    //"Østerhus" < "Osypowicz"
    res = [str2 localizedCompare:str3];
    XCTAssertTrue(res == NSOrderedAscending, @"(localizedCompare:) Expected '%@' and '%@' to be NSOrderedAscending, but got %@", str2, str3, res == NSOrderedSame ? @"NSOrderedSame" : @"NSOrderedDescending");

    //"O" < "Ø"
    res = [letter1 localizedCompare:letter2];
    XCTAssertTrue(res == NSOrderedAscending, @"(localizedCompare:) Expected '%@' and '%@' to be NSOrderedAscending, but got %@", letter1, letter2, res == NSOrderedSame ? @"NSOrderedSame" : @"NSOrderedDescending");

所以，最终的问题是，给定一个使用 Unicode 代码点的人名（或任何其他字符串），我们如何正确（以本地化方式）返回一个与排序顺序相对应的部分名称localizedCompare:?

此外，localizedCompare: 显然将“Ø”和“O”视为NSOrderedSame 后跟其他字符，这是怎么回事？

【问题讨论】：

区域设置是什么？我无法重现该问题。 @MartinR "en_US" 这似乎是一个类似的问题***.com/questions/2167857/…，也许其中一个答案会有所帮助。 【参考方案1】：

我预计 localizedCompare: 正在使用导致此行为的 NSStringCompareOptions 标志的特定组合。 https://developer.apple.com/documentation/foundation/nsstringcompareoptions?preferredLanguage=occ

使用compare:options: 并开启NSDiacriticInsensitiveSearch 可能会得到您想要的结果。

为了生成节索引，最好先去除所有扩展字符的值，然后取第一个字母。比如：

[[str1 stringByFoldingWithOptions:NSCaseInsensitiveSearch | NSDiacriticInsensitiveSearch] substringToIndex:1]

这样，以“Édward”等重音字母开头的名称将在您使用该部分的第一个字母之前转换为“Edward”。

【讨论】：

谢谢@a-goodale。遗憾的是我们不能使用compare:options: 或类似的。我们有一个 SQLite 存储支持，所以我们的选择是有限的。具体来说，“SQLite 支持的排序选择器是 compare: 和 caseInsensitiveCompare:、localizedCompare:、localizedCaseInsensitiveCompare: 和localizedStandardCompare:。后者是类似 Finder 的排序，大多数人大部分时间都应该使用。此外，你不能排序使用 SQLite 存储的瞬态属性。”见developer.apple.com/library/content/documentation/Cocoa/… 同样，stringByFoldingWithOptions:locale: 仍然为“Østerhus”生成“Ø”，因此将“Østerhus”放在不同的部分，导致 localizedCompare: 显然处理“Ø”和“O”时出现同样奇怪的问题为 NSOrderedSame 后跟其他字符。 :(【参考方案2】：

是的，去过那里。我找到的唯一解决方案是创建第二个搜索字段以简化字符（不记得该方法）并将其存储为用于搜索的第二个字段。不是超级优雅，但它有效。

【讨论】：

谢谢。是的，我认为唯一可靠的方法是将规范化的部分名称存储在数据库中，以便 FRC 可以对它们进行排序。【参考方案3】：

最终解决这个问题的方法是将规范化的部分名称存储在数据库中。

@MartinR 建议 SO 帖子将我带到 https://***.com/a/13292767/397210，其中谈到了这种方法，并且是解决它的关键“啊哈”时刻。

虽然这并不能解释 localizedCompare: 的愚蠢行为，显然将“Ø”和“O”视为 NSOrderedSame 后跟其他字符，但恕我直言，这是一种适用于所有 Unicode 代码点的更强大和更完整的解决方案，在我们的测试中。

具体做法是：

sectionName

sectionNameKeyPath

NSFetchedResultsController

-initWithFetchRequest:managedObjectContext:sectionNameKeyPath:cacheName:

对于传递给NSFetchedResultsController的获取请求所使用的排序描述符，请务必先按节名排序，然后按节的内容如何排序（例如人名），注意使用比较选择器的本地化版本。例如：

[NSSortDescriptor sortDescriptorWithKey:@"sectionName" ascending:YES selector:@selector(localizedStandardCompare:)],
[NSSortDescriptor sortDescriptorWithKey:@"personName" ascending:YES selector:@selector(localizedCaseInsensitiveCompare:)]

测试。

*规范化的部分名称

在处理 unicode 时，我们需要小心假设第一个“字符”是什么。 “字符”可以由多个字符组成。见https://www.objc.io/issues/9-strings/unicode/ 还有Compare arabic strings with special characters ios

这是我用来生成规范化节名的方向：

    NSString *decomposedString = name.decomposedStringWithCanonicalMapping;
    NSRange firstCharRange = [decomposedString rangeOfComposedCharacterSequenceAtIndex:0];
    NSString *firstChar = [decomposedString substringWithRange:firstCharRange];
    retVal = [firstChar localizedUppercaseString];

希望这种方法对其他人来说清晰且有用，并感谢大家的帮助。

【讨论】：

以上是关于NSString 本地化比较：给定较长字符串的不一致结果的主要内容，如果未能解决你的问题，请参考以下文章

如何从给定的整个 NSString 计算相同的子字符串

Swift3.0语言教程比较判断字符串

在Objective C中比较NSString

编写函数比较两个字符串的长度返回较长的字符串

NSString类里有个hash

iOS中如何比较NSString包含的中文字符？