Swift标准库源码之旅 -Collection

2020-10-09 本文已影响0人 Zafir_zzf

背景

Collection协议是继Sequence之后第二基础的一个容器协议. 距离咱们常用的Array其实还差很远.

选一条比较重要的继承链是下面这样的.

Collection -> BidirectionalCollection -> RandomAccessCollection -> Array

此外还有MutableCollection和RangeReplaceableCollection共同构成Array的各个功能。
这么多的协议一开始是让人头大的. 甚至会让人怀疑是否真的需要拆分这么多协议. 带着疑问先从最上面的Collection来吧.

与Sequence的区别

首先Collection协议遵循Sequence协议，它继承了Sequence的所有功能，此外，在其基础上增加了一些能力。

Collection的遍历是无损耗的。即多次遍历的结果都是相同的，而Sequence无法保证可以进行多次遍历
Collection的元素是有限个，一定会迭代结束。所以有一个Count属性可以获取其个数
可以通过索引访问单个元素

实现一个最低要求的Collection

目前有一个Family的数据类型定义如下

struct Family {
    let father: String
    let me: String
    let son: String
}

让这个Family类型成为一个Collection(虽然它其实并不像是一个Collection)只需要实现下面几个方法

struct Family: Collection {
    
    let father: String
    let me: String
    let son: String

    var startIndex: Int { 0 }
    var endIndex: Int { 3 }

    func index(after i: Int) -> Int {
        return i + 1
    }
    
    subscript(position: Int) -> String {
        get {
            switch position {
            case 0: return father
            case 1: return me
            case 2: return son
            default:
                fatalError()
            }
        }
    }
    
}

第一步要定义startIndex和endIndex及Index的类型(这里是Int)。这样就让这个集合有了首和尾，计算count和遍历都要用到这两个index。这里我们的family只有是写死的0和3。

第二步是要表明遍历时如何进行更新索引的index(after)方法。大部分索引类型为Int的集合都是i + 1。但对于有些类型则未必，所以将这个方法交给了外部去实现。

第三部是要实现如何进行索引访问单个值。

这样Collection便具备了上面所说的应该具有的几个能力。但是Collection协议本身的定义比上面看起来复杂的多

Collection协议的定义

protocol Collection: Sequence {
    override associatedtype Element
    associatedtype Index: Comparable
    var startIndex: Index { get }
    var endIndex: Index { get }

    func index(after i: Index) -> Index

    override __consuming func makeIterator() -> Iterator

    subscript(position: Index) -> Element { get }
    subscript(bounds: Range<Index>) -> SubSequence { get }

    associatedtype Indices: Collection = DefaultIndices<Self>
    where Indices.Element == Index, 
          Indices.Index == Index,
          Indices.SubSequence == Indices

    var indices: Indices { get }
  
    var isEmpty: Bool { get }

    var count: Int { get }

    func index(_ i: Index, offsetBy distance: Int) -> Index

    func formIndex(after i: inout Index)
}

虽然定义了很多方法，但除了必须自己实现的几个，其它的都有了默认实现。放到协议的定义里是提供自己实现的可能以供扩展。

比如要实现遍历方法所需要的Iterator，默认类型是IndexingIterator

associatedtype Iterator = IndexingIterator<Self>

这个迭代器通过startIndex/endIndex和index(after:)方法进行遍历逻辑，我们基本不需要自己去定义实现一个别的迭代器类型

extension Collection where Iterator == IndexingIterator<Self> {
  func makeIterator() -> IndexingIterator<Self> {
    return IndexingIterator(_elements: self)
  }
}

我们直接看一下它的next实现吧

mutating func next() -> Elements.Element? {
    if _position == _elements.endIndex { return nil }
    let element = _elements[_position]
    _elements.formIndex(after: &_position)
    return element
  }

简单明了.

不过笔者目前还不知道使用formIndex而不直接用_position = _elements.index(after: index)是为什么

extension Collection {
  public func formIndex(after i: inout Index) {
    i = index(after: i)
 }

一些默认实现

// 如果类型使用默认的IndexingIterator帮其实现makeIterator()
extension Collection where Iterator == IndexingIterator<Self> {
  func makeIterator() -> IndexingIterator<Self> {
    return IndexingIterator(_elements: self)
  }
}

// 时间复杂度是常数
var isEmpty: Bool {
    return startIndex == endIndex
 }

// 如果遵循了RandomAccessCollection复杂度是常数, 不然是O(n)
var count: Int {
    return distance(from: startIndex, to: endIndex)
  }

// 对于RandomAccessCollection此函数需要时间复杂度为常数
func distance(from start: Index, to end: Index) -> Int {
    var start = start
    var count = 0
    while start != end {
      count = count + 1
      formIndex(after: &start)
    }
    return count
  }

var first: Element? {
    let start = startIndex
    if start != endIndex { return self[start] }
    else { return nil }
  }

这个first的实现为什么没有直接用self[startIndex]呢.
我能想到的是如果用startIndex,在判断startIndex != endIndex之后startIndex在其它的线程被修改了到了return self[startIndex]就会发生意料之外的事.
所以先用一个不会改变的常量接下可能会发生改变的属性是确保线程安全的一种方法。

Indices, SubSequence

associatedtype Indices: Collection = DefaultIndices<Self>
    where Indices.Element == Index, 
          Indices.Index == Index,
          Indices.SubSequence == Indices

var indices: Indices { get }

索引集合 Indicaes: A collection of indices for an arbitrary collection

一个Collection所有索引的一个集合. 在索引类型为Int时候因为我们有endIndex和count属性, 笔者目前想不到其使用场景有哪些. 可能看到String类型对此的实现会明白

struct DefaultIndices<Base: MyCollection> {
    let _base: Base
    let _startIndex: Base.Index
    let _endIndex: Base.Index
    
    init(base: Base, startIndex: Base.Index, endIndex: Base.Index) {
        (self._base, self._startIndex, self._endIndex) = (base, startIndex, endIndex)
    }
}

extension DefaultIndices: MyCollection {
    
    typealias Index = Base.Index
    typealias Element = Base.Index
    typealias Indices = DefaultIndices<Base>
    typealias SubSequence = DefaultIndices<Base>
    
    var startIndex: Index { self._startIndex }
    var endIndex: Index { self._endIndex }
    
    subscript(position: Index) -> Index {
        position
    }
    
    var indices: DefaultIndices<Base> {
        self
    }
    
    subscript(bounds: Range<Base.Index>) -> DefaultIndices<Base> {
        .init(base: _base, startIndex: _startIndex, endIndex: _endIndex)
    }
    func index(after i: Index) -> Index {
        _base.index(after: i)
    }
}

切片

associatedtype SubSequence: Collection = Slice<Self>
  where SubSequence.Index == Index,
        Element == SubSequence.Element,
        SubSequence.SubSequence == SubSequence

对集合类型截取其中某一段返回的类型就是切片类型Slice, 也被命名为SubSequence, 它是一个存储了原集合的引用和起始结束索引的类型. 跟Indices很像, 但是它的下标取值方法返回的是原集合的Element.

/// - Complexity: O(1)
subscript(bounds: Range<Index>) -> SubSequence { get }

可以设想,如果想截取集合的某一段不返回一个切片类型而是直接返回一个新集合(比如Array), 多余的空间开销确实是没必要的,因为我们截取之后可能只是用来一次遍历.

struct Slice<Base: MyCollection> {
    let startIndex: Base.Index
    let endIndex: Base.Index
    let base: Base
    
    init(base: Base, bounds: Range<Base.Index>) {
        self.base = base
        self.startIndex = bounds.lowerBound
        self.endIndex = bounds.upperBound
    }
}

extension Slice: MyCollection {
    
    typealias Index = Base.Index
    typealias Element = Base.Element
    typealias Iterator = MyIndexIterator<Slice<Base>>
    
    func index(after i: Base.Index) -> Base.Index {
        base.index(after: i)
    }
    
    subscript(position: Base.Index) -> Base.Element {
        base[position]
    }
}