Swift:拆分 [String] 得到具有给定子数组大小的 [[String]] 的正确方法是啥?

Posted

技术标签:

【中文标题】Swift:拆分 [String] 得到具有给定子数组大小的 [[String]] 的正确方法是啥?【英文标题】:Swift: what is the right way to split up a [String] resulting in a [[String]] with a given subarray size?Swift:拆分 [String] 得到具有给定子数组大小的 [[String]] 的正确方法是什么? 【发布时间】:2014-12-11 07:26:54 【问题描述】:

从一个大的 [String] 和给定的子数组大小开始,我可以将这个数组拆分成更小的数组的最佳方法是什么? (最后一个数组将小于给定的子数组大小)。

具体例子:

拆分 ["1","2","3","4","5","6","7"],最大拆分大小为 2

代码会产生 [["1","2"],["3","4"],["5","6"],["7"]]

显然我可以更手动地执行此操作,但我觉得像 map() 或 reduce() 这样的快速操作可能会非常漂亮地完成我想要的操作。

【问题讨论】:

您希望在什么基础上进行拆分?鉴于您正在谈论“页面大小”,字体和大小必须很重要。你为什么要自己做而不是让操作系统来做文本布局? 页面大小是什么意思? @GaryMakin 抱歉,现在更新。这只是一个设定的拆分大小,即将数组拆分为最大大小为 100 的较小数组。 @Jordan,尽管这些很有趣,但这并不是 SO 的真正用途 - 您可能想在 #swift-lang IRC 频道中提出这些问题。 我在搜索 Ruby 的 each_cons 函数 ***.com/q/39756309/78336 的快速等效项时问了几乎相同的问题 【参考方案1】:

我认为您不会想要使用 map 或 reduce。 Map 用于对数组中的每个单独元素应用函数,而 reduce 用于展平数组。你想要做的是将数组分割成一定大小的子数组。这个 sn-p 使用切片。

var arr = ["1","2","3","4","5","6","7"]
var splitSize = 2

var newArr = [[String]]()
var i = 0
while i < arr.count 
    var slice: Slice<String>!
    if i + splitSize >= arr.count 
        slice = arr[i..<arr.count]
    
    else 
        slice = arr[i..<i+splitSize]
    
    newArr.append(Array(slice))
    i += slice.count

println(newArr)

【讨论】:

此解决方案适用于 swift 2.2 到 3.0,这是一个优点!而且我认为它更具可读性,直到我们都了解“新语言”的最新风格..我的意思是迅速。【参考方案2】:

我不会说它漂亮,但这里有一个使用map的方法:

let numbers = ["1","2","3","4","5","6","7"]
let splitSize = 2
let chunks = numbers.startIndex.stride(to: numbers.count, by: splitSize).map 
  numbers[$0 ..< $0.advancedBy(splitSize, limit: numbers.endIndex)]

stride(to:by:) 方法为您提供每个块的第一个元素的索引,因此您可以使用 advancedBy(distance:limit:) 将这些索引映射到源数组的切片。

一种更“功能性”的方法就是对数组进行递归,如下所示:

func chunkArray<T>(s: [T], splitSize: Int) -> [[T]] 
    if countElements(s) <= splitSize 
        return [s]
     else 
        return [Array<T>(s[0..<splitSize])] + chunkArray(Array<T>(s[splitSize..<s.count]), splitSize)
    

【讨论】:

Swift 2.0 let chunks = stride(from: 0, to: numbers.count, by: splitSize).map( numbers[$0.. 新的 XC 7 Beta 6 现已失效【参考方案3】:

上面的内容很中肯,但它让我很头疼。我不得不恢复到不那么迅速的方法。

对于 Swift 2.0

var chunks = [[Int]]()
var temp = [Int]()
var splitSize = 3

var x = [1,2,3,4,5,6,7]

for (i, element) in x.enumerate() 

    if temp.count < splitSize 
        temp.append(element)
    
    if temp.count == splitSize 
        chunks.append(temp)
        temp.removeAll()
    


if !temp.isEmpty 
    chunks.append(temp)

Playground Result [[1, 2, 3], [4, 5, 6], [7]]

【讨论】:

【参考方案4】:

我喜欢 Nate Cook 的回答,看起来 Swift 自编写以来一直在进步,这是我对 Array 的扩展的看法:

extension Array 
    func chunk(chunkSize : Int) -> Array<Array<Element>> 
        return 0.stride(to: self.count, by: chunkSize)
            .map  Array(self[$0..<$0.advancedBy(chunkSize, limit: self.count)]) 
    

注意,它返回 [] 表示负数,并会导致上面写的致命错误。如果你想防止这种情况发生,你必须设置一个警卫。

func testChunkByTwo() 
    let input = [1,2,3,4,5,6,7]
    let output = input.chunk(2)
    let expectedOutput = [[1,2], [3,4], [5,6], [7]]
    XCTAssertEqual(expectedOutput, output)


func testByOne() 
    let input = [1,2,3,4,5,6,7]
    let output = input.chunk(1)
    let expectedOutput = [[1],[2],[3],[4],[5],[6],[7]]
    XCTAssertEqual(expectedOutput, output)


func testNegative() 
    let input = [1,2,3,4,5,6,7]
    let output = input.chunk(-2)
    let expectedOutput = []
    XCTAssertEqual(expectedOutput, output)

【讨论】:

【参考方案5】:

使用 Swift 5,根据您的需要,您可以选择以下五种方式中的一种来解决您的问题。


1。在Collection 扩展方法中使用AnyIterator

AnyIterator 是迭代符合Collection 协议的对象的索引以返回该对象的子序列的良好候选者。在Collection 协议扩展中,您可以声明具有以下实现的chunked(by:) 方法:

extension Collection 
    
    func chunked(by distance: Int) -> [[Element]] 
        precondition(distance > 0, "distance must be greater than 0") // prevents infinite loop

        var index = startIndex
        let iterator: AnyIterator<Array<Element>> = AnyIterator(
            let newIndex = self.index(index, offsetBy: distance, limitedBy: self.endIndex) ?? self.endIndex
            defer  index = newIndex 
            let range = index ..< newIndex
            return index != self.endIndex ? Array(self[range]) : nil
        )
        
        return Array(iterator)
    
    

用法:

let array = ["1", "2", "3", "4", "5", "6", "7", "8", "9"]
let newArray = array.chunked(by: 2)
print(newArray) // prints: [["1", "2"], ["3", "4"], ["5", "6"], ["7", "8"], ["9"]]

2。在Array 扩展方法中使用stride(from:to:by:) 函数

Array 索引的类型为Int 并符合Strideable 协议。因此,您可以将stride(from:to:by:)advanced(by:) 与它们一起使用。在Array 扩展中,您可以声明具有以下实现的chunked(by:) 方法:

extension Array 
    
    func chunked(by distance: Int) -> [[Element]] 
        let indicesSequence = stride(from: startIndex, to: endIndex, by: distance)
        let array: [[Element]] = indicesSequence.map 
            let newIndex = $0.advanced(by: distance) > endIndex ? endIndex : $0.advanced(by: distance)
            //let newIndex = self.index($0, offsetBy: distance, limitedBy: self.endIndex) ?? self.endIndex // also works
            return Array(self[$0 ..< newIndex])
        
        return array
    
    

用法:

let array = ["1", "2", "3", "4", "5", "6", "7", "8", "9"]
let newArray = array.chunked(by: 2)
print(newArray) // prints: [["1", "2"], ["3", "4"], ["5", "6"], ["7", "8"], ["9"]]

3。在Array 扩展方法中使用递归方法

基于 Nate Cook recursive code,您可以使用以下实现在 Array 扩展中声明 chunked(by:) 方法:

extension Array 

    func chunked(by distance: Int) -> [[Element]] 
        precondition(distance > 0, "distance must be greater than 0") // prevents infinite loop

        if self.count <= distance 
            return [self]
         else 
            let head = [Array(self[0 ..< distance])]
            let tail = Array(self[distance ..< self.count])
            return head + tail.chunked(by: distance)
        
    
    

用法:

let array = ["1", "2", "3", "4", "5", "6", "7", "8", "9"]
let newArray = array.chunked(by: 2)
print(newArray) // prints: [["1", "2"], ["3", "4"], ["5", "6"], ["7", "8"], ["9"]]

4。在 Collection 扩展方法中使用 for 循环和批处理

Chris Eidhof 和 Florian Kugler 在 Swift Talk #33 - Sequence & Iterator (Collections #2) 视频中展示了如何使用简单的 for 循环填充成批的序列元素,并在完成时将它们附加到数组中。在Sequence 扩展中,您可以声明具有以下实现的chunked(by:) 方法:

extension Collection 
    
    func chunked(by distance: Int) -> [[Element]] 
        var result: [[Element]] = []
        var batch: [Element] = []
        
        for element in self 
            batch.append(element)
            
            if batch.count == distance 
                result.append(batch)
                batch = []
            
        
        
        if !batch.isEmpty 
            result.append(batch)
        
        
        return result
    
    

用法:

let array = ["1", "2", "3", "4", "5", "6", "7", "8", "9"]
let newArray = array.chunked(by: 2)
print(newArray) // prints: [["1", "2"], ["3", "4"], ["5", "6"], ["7", "8"], ["9"]]

5。使用符合SequenceIteratorProtocol 协议的自定义struct

如果您不想创建SequenceCollectionArray 的扩展,您可以创建符合SequenceIteratorProtocol 协议的自定义struct。这个struct 应该有以下实现:

struct BatchSequence<T>: Sequence, IteratorProtocol 
    
    private let array: [T]
    private let distance: Int
    private var index = 0
    
    init(array: [T], distance: Int) 
        precondition(distance > 0, "distance must be greater than 0") // prevents infinite loop
        self.array = array
        self.distance = distance
    
    
    mutating func next() -> [T]? 
        guard index < array.endIndex else  return nil 
        let newIndex = index.advanced(by: distance) > array.endIndex ? array.endIndex : index.advanced(by: distance)
        defer  index = newIndex 
        return Array(array[index ..< newIndex])
    
    

用法:

let array = ["1", "2", "3", "4", "5", "6", "7", "8", "9"]
let batchSequence = BatchSequence(array: array, distance: 2)
let newArray = Array(batchSequence)
print(newArray) // prints: [["1", "2"], ["3", "4"], ["5", "6"], ["7", "8"], ["9"]]

【讨论】:

嗨,你有那个扩展方法的 Swift 3 版本吗? 很好的答案,谢谢!请注意,如果被分块的数组为空,则选项 4 具有我认为奇怪的行为。它返回 [] 而不是 [[]]。选项 3 的行为符合我的预期。【参考方案6】:

我将在这里使用基于 AnyGenerator. 的另一个实现来表达我的想法

extension Array 
    func chunks(_ size: Int) -> AnyIterator<[Element]> 
        if size == 0 
            return AnyIterator 
                return nil
            
        

        let indices = stride(from: startIndex, to: count, by: size)
        var generator = indices.makeIterator()

        return AnyIterator 
            guard let i = generator.next() else 
                return nil
            

            var j = self.index(i, offsetBy: size)
            repeat 
                j = self.index(before: j)
             while j >= self.endIndex

            return self[i...j].lazy.map  $0 
        
    

我更喜欢这种方法,因为它完全依赖生成器,在处理大型数组时可以产生不可忽略的积极内存影响。

对于您的具体示例,它的工作原理如下:

let chunks = Array(["1","2","3","4","5","6","7"].chunks(2))

结果:

[["1", "2"], ["3", "4"], ["5", "6"], ["7"]]

【讨论】:

【参考方案7】:

在 Swift 3/4 中,这将如下所示:

let numbers = ["1","2","3","4","5","6","7"]
let chunkSize = 2
let chunks = stride(from: 0, to: numbers.count, by: chunkSize).map 
    Array(numbers[$0..<min($0 + chunkSize, numbers.count)])

// prints as [["1", "2"], ["3", "4"], ["5", "6"], ["7"]]

作为 Array 的扩展:

extension Array 
    func chunked(by chunkSize: Int) -> [[Element]] 
        return stride(from: 0, to: self.count, by: chunkSize).map 
            Array(self[$0..<Swift.min($0 + chunkSize, self.count)])
        
    

或者稍微冗长但更笼统的:

let numbers = ["1","2","3","4","5","6","7"]
let chunkSize = 2
let chunks: [[String]] = stride(from: 0, to: numbers.count, by: chunkSize).map 
    let end = numbers.endIndex
    let chunkEnd = numbers.index($0, offsetBy: chunkSize, limitedBy: end) ?? end
    return Array(numbers[$0..<chunkEnd])

这更笼统,因为我对集合中的索引类型做出的假设更少。在之前的实现中,我假设它们可以进行比较和添加。

请注意,在 Swift 3 中,推进索引的功能已从索引本身转移到集合中。

【讨论】:

可以使用 ArraySlice 作为更有效的方法,即 func chunked(by chunkSize: Int) -&gt; [ArraySlice&lt;Element&gt;] 然后减去 Array( ... ) 演员表 如何编辑扩展,制作不同大小的分块数组?例如第一个数组包含 17 和其他数组包含 25 ?【参考方案8】:

将Tyler Cloutier's formulation 表示为 Array 的扩展会很好:

extension Array 
    func chunked(by chunkSize:Int) -> [[Element]] 
        let groups = stride(from: 0, to: self.count, by: chunkSize).map 
            Array(self[$0..<[$0 + chunkSize, self.count].min()!])
        
        return groups
    

这为我们提供了一种将数组划分为块的通用方法。

【讨论】:

Swift.min($0 + chunkSize, self.count) 而不必创建数组【参考方案9】:

您知道任何具有 [a...b] swift 风格的解决方案的运行速度比常规解决方案慢 10 倍吗?

for y in 0..<rows 
    var row = [Double]()
    for x in 0..<cols 
        row.append(stream[y * cols + x])
    
    mat.append(row)

试试看,这是我的原始测试代码:

let count = 1000000
let cols = 1000
let rows = count / cols
var stream = [Double].init(repeating: 0.5, count: count)

// Regular
var mat = [[Double]]()

let t1 = Date()

for y in 0..<rows 
    var row = [Double]()
    for x in 0..<cols 
        row.append(stream[y * cols + x])
    
    mat.append(row)


print("regular: \(Date().timeIntervalSince(t1))")


//Swift
let t2 = Date()

var mat2: [[Double]] = stride(from: 0, to: stream.count, by: cols).map 
    let end = stream.endIndex
    let chunkEnd = stream.index($0, offsetBy: cols, limitedBy: end) ?? end
    return Array(stream[$0..<chunkEnd])


print("swift: \(Date().timeIntervalSince(t2))")

出去:

常规:0.0449600219726562

迅速:0.49255496263504

【讨论】:

让我猜猜。您正在操场上对此进行基准测试【参考方案10】:

Swift 4 中的新功能,您可以使用 reduce(into:) 高效地完成此操作。这是对序列的扩展:

extension Sequence 
    func eachSlice(_ clump:Int) -> [[Self.Element]] 
        return self.reduce(into:[])  memo, cur in
            if memo.count == 0 
                return memo.append([cur])
            
            if memo.last!.count < clump 
                memo.append(memo.removeLast() + [cur])
             else 
                memo.append([cur])
            
        
    

用法:

let result = [1,2,3,4,5,6,7,8,9].eachSlice(2)
// [[1, 2], [3, 4], [5, 6], [7, 8], [9]]

【讨论】:

【参考方案11】:

在 Swift 4 或更高版本中,您还可以扩展 Collection 并返回其中的 SubSequence 集合,以便能够将其与 StringProtocol 类型(StringSubstring)一起使用。这样它将返回一个子字符串的集合,而不是一堆字符的集合:

Xcode 10.1 • Swift 4.2.1 或更高版本

extension Collection 
    func subSequences(limitedTo maxLength: Int) -> [SubSequence] 
        precondition(maxLength > 0, "groups must be greater than zero")
        var start = startIndex
        var subSequences: [SubSequence] = []
        while start < endIndex 
            let end = index(start, offsetBy: maxLength, limitedBy: endIndex) ?? endIndex
            defer  start = end 
            subSequences.append(self[start..<end])
        
        return subSequences
    


或者在 cmets 中由 @Jessy 建议使用收集方法

public func sequence<T, State>(state: State, next: @escaping (inout State) -> T?) -> UnfoldSequence<T, State>

extension Collection 
    func subSequences(limitedTo maxLength: Int) -> [SubSequence] 
        precondition(maxLength > 0, "groups must be greater than zero")
        return .init(sequence(state: startIndex)  start in
            guard start < self.endIndex else  return nil 
            let end = self.index(start, offsetBy: maxLength, limitedBy: self.endIndex) ?? self.endIndex
            defer  start = end 
            return self[start..<end]
        )
    

用法

let array = ["1", "2", "3", "4", "5", "6", "7", "8", "9"]
let slices = array.subSequences(limitedTo: 2)  // [ArraySlice(["1", "2"]), ArraySlice(["3", "4"]), ArraySlice(["5", "6"]), ArraySlice(["7", "8"]), ArraySlice(["9"])]
for slice in slices 
    print(slice) // prints: [["1", "2"], ["3", "4"], ["5", "6"], ["7", "8"], ["9"]]

// To convert from ArraySlice<Element> to Array<element>
let arrays = slices.map(Array.init)  // [["1", "2"], ["3", "4"], ["5", "6"], ["7", "8"], ["9"]]


extension Collection 
    var singles: [SubSequence]  return subSequences(limitedTo: 1) 
    var pairs:   [SubSequence]  return subSequences(limitedTo: 2) 
    var triples: [SubSequence]  return subSequences(limitedTo: 3) 
    var quads:   [SubSequence]  return subSequences(limitedTo: 4) 


字符数组或数组切片

let chars = ["a","b","c","d","e","f","g","h","i"]
chars.singles  // [["a"], ["b"], ["c"], ["d"], ["e"], ["f"], ["g"], ["h"], ["i"]]
chars.pairs    // [["a", "b"], ["c", "d"], ["e", "f"], ["g", "h"], ["i"]]
chars.triples  // [["a", "b", "c"], ["d", "e", "f"], ["g", "h", "i"]]
chars.quads    // [["a", "b", "c", "d"], ["e", "f", "g", "h"], ["i"]]
chars.dropFirst(2).quads  // [["c", "d", "e", "f"], ["g", "h", "i"]]

StringProtocol 元素(字符串和子字符串)

let str = "abcdefghi"
str.singles  // ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
str.pairs    // ["ab", "cd", "ef", "gh", "i"]
str.triples  // ["abc", "def", "ghi"]
str.quads    // ["abcd", "efgh", "i"]
str.dropFirst(2).quads    // ["cdef", "ghi"]

【讨论】:

这是个好主意!但是count 可能是 O(n),所以最好找到一些其他的迭代方式。我在我的答案中放了一个。 @Jessy 你可以简单地使用一个while循环 不,那么您必须选择要返回的集合类型,而不是仅将子序列作为序列提供。 好吧,我很想看看这个的基准测试结果 @Jessy 我已经按照你的建议编辑了我的答案。这种方法有什么问题吗?【参考方案12】:

Swift 5.1 - 各种集合的通用解决方案:

extension Collection where Index == Int 
    func chunked(by chunkSize: Int) -> [[Element]] 
        stride(from: startIndex, to: endIndex, by: chunkSize).map  Array(self[$0..<Swift.min($0 + chunkSize, count)]) 
    

【讨论】:

这不是通用的。它要求集合由 Int【参考方案13】:
public extension Optional 
  /// Wraps a value in an `Optional`, based on a condition.
  /// - Parameters:
  ///   - wrapped: A non-optional value.
  ///   - getIsNil: The condition that will result in `nil`.
  init(
    _ wrapped: Wrapped,
    nilWhen getIsNil: (Wrapped) throws -> Bool
  ) rethrows 
    self = try getIsNil(wrapped) ? nil : wrapped
  

public extension Sequence 
  /// Splits a `Sequence` into equal "chunks".
  ///
  /// - Parameter maxArrayCount: The maximum number of elements in a chunk.
  /// - Returns: `Array`s with `maxArrayCount` `counts`,
  ///   until the last chunk, which may be smaller.
  subscript(maxArrayCount maxCount: Int) -> AnySequence<[Element]> 
    .init(
      sequence( state: makeIterator() )  iterator in
        Optional(
          (0..<maxCount).compactMap  _ in iterator.next() ,
          nilWhen: \.isEmpty
        )
      
    )
  

// [ ["1", "2"], ["3", "4"], ["5", "6"], ["7"] ]"
(1...7).map(String.init)[maxArrayCount: 2]
public extension Collection 
  /// Splits a `Collection` into equal "chunks".
  ///
  /// - Parameter maxSubSequenceCount: The maximum number of elements in a chunk.
  /// - Returns: `SubSequence`s with `maxSubSequenceLength` `counts`,
  ///   until the last chunk, which may be smaller.
  subscript(maxSubSequenceCount maxCount: Int) -> AnySequence<SubSequence> 
    .init(
      sequence(state: startIndex)  startIndex in
        guard startIndex < self.endIndex
        else  return nil 

        let endIndex =
          self.index(startIndex, offsetBy: maxCount, limitedBy: self.endIndex)
          ?? self.endIndex
        defer  startIndex = endIndex 
        return self[startIndex..<endIndex]
      
    )
  

// ["12", "34", "56", "7"]
(1...7).map(String.init).joined()[maxSubSequenceCount: 2]

【讨论】:

以上是关于Swift:拆分 [String] 得到具有给定子数组大小的 [[String]] 的正确方法是啥?的主要内容,如果未能解决你的问题,请参考以下文章

CSS选择器-具有给定子元素的元素[重复]

交换某些字符后具有最大数量的给定子字符串的字符串?

在 Swift 中将字符串拆分为数组?

错误:无法将“[String]”类型的值分配给 swift 中的“String”类型

在 swift 中将 String 类型数组转换为 Float 类型数组 不能将类型 'String' 的值分配给类型 'Double' 的下标。

Swift - 在多行上拆分字符串