1.12迭代和推导

2020-07-27 本文已影响0人 Benedict清水

一、迭代

“可迭代对象”是Python 语言中比较新的概念。如果对象是实际保存的序列或者是可以在迭代工具上下文中（例如，for循环、列表推导、in成员关系测试，以及内置函数map等）一次产生一个结果的对象，那么就可看作是可迭代的。术语可迭代对像（iterable）来指代一个支持iter调用的对象，术语迭代器（iterator）来指代一个（iter调用为传入的可迭代对象返回的）支持next(I)调用的对象。生成器指代能自动支持迭代协议的对象，因此生成器本省就是可迭代对象。

1. 迭代协议

我们通过观察迭代是如何与内置类型一起工作的来讲解迭代协议。例如，文件。

>>> print(open("script.txt").read())
Stray birds of summer come to my window to sing and fly away.
And yellow leaves of autumn, which have no songs, flutter and fall there with a sigh.
If you shed tears when you miss the sun, you also miss the stars.

python中为已打开的文件对象提供一个readline的方法，可以一次从一个文件中读取一行文本。每次调用readline方法时，我们就会前进到下一行。当到达文件末尾的时，就会返回空字符串，因此我们可通过检测空字符串来跳出循环：

>>> f = open("script.txt")
>>> f.readline()
'Stray birds of summer come to my window to sing and fly away.\n'
>>> f.readline()
'And yellow leaves of autumn, which have no songs, flutter and fall there with a sigh.\n'
>>> f.readline()
'If you shed tears when you miss the sun, you also miss the stars.'
>>> f.readline()
''

文件也有一个名为__next__的方法，有着几乎相同的效果：每次调用时，就会返回文件的下一行。唯一值的注意的区别在于，当到达文件末尾时，__next__会引发内置的StopIteration异常，而不是返回空字符串：

>>> f = open("script.txt")
>>> f.__next__()
'Stray birds of summer come to my window to sing and fly away.\n'
>>> f.__next__()
'And yellow leaves of autumn, which have no songs, flutter and fall there with a sigh.\n'
>>> f.__next__()
'If you shed tears when you miss the sun, you also miss the stars.'
>>> f.__next__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

这就是python中所谓的迭代协议：所有带有__next__方法的对象会前进到下一个结果，而当到达一些列结果的末尾时，__next__会引发StopIteration异常。任何支持这种协议的对象都可以称为迭代器，任何这类对象也能以for循环或者其他迭代工具遍历，因为所有的迭代工具内部工作起来都是在每次迭代中调用__next__，并通过捕捉StopIteration异常来确定何时离开。
注意：某些对象完整的迭代协议包括额外的一步iter调用。来看一个例子：

class Squares:
    def __init__(self, start, stop):
        self.value = start - 1
        self.stop = stop

    def __iter__(self):
        print("__iter__")
        return self

    def __next__(self):
        print("__next__")
        if self.value == self.stop:
            raise StopIteration
        self.value += 1
        return self.value


if __name__ == "__main__":
    s = Squares(1, 5)
    for x in s:
        print(x)

% python squares.py

__iter__
__next__
1
__next__
2
__next__
3
__next__
4
__next__
5
__next__

我们在运行结果中看到首先输出“__iter__”，说明首先调用了内置函数iter，而我们通过运算符重载__iter__捕获到这一行为。说明for循环在开始时，首先把可迭代对象传入内置函数iter，并由此拿到一个迭代器；而iter调用返回的迭代器对象有着所需的next方法。iter函数与next和__next__很像，在它的内部调用了__iter__方法。

小结：

可迭代对象：迭代的被调对象，其__iter__方法被iter函数调用。
迭代器对象：可迭代对象的返回结果，在迭代过程中实际提供值的对象。它的__next__方法被next运行，并在结束时触发StopIteration异常。
迭代器对象通常是临时的，他们在迭代工具内部被使用。

二、列表推导

首先看一个例子，编历列表，对列表中的所有元素的值加10，得到一个新列表。
方法一：

>>> L = [1,2,3,4,5]
>>> for i in range(len(L)):
...     L[i] += 10
... 
>>> L
[11, 12, 13, 14, 15]

方法二：

>>> L = [1,2,3,4,5]
>>> res = []
>>> for x in L:
...     res.append(x + 10)
... 
>>> res
[11, 12, 13, 14, 15]

上面的方法都可以实现，但它可能不是python中优化的“最佳实践”。例如，我们可以用能产生所需结果列表的一个单个表达式来替代该循环：

>>> L = [1,2,3,4,5]
>>> L = [x + 10 for x in L]
>>> L
[11, 12, 13, 14, 15]

上述实现就是列表推导表达式。最终的结果是相似的，但列表推导只需要更少的代码，并且运行速度会大大提升。列表推导和for循环的区别就是：列表推导会产生一个新的列表对像。

1. 列表推导基础

 L = [x + 10 for x in L]

列表推导写在一个方括号中，因为它们是最终构建一个新的列表的一种方式。它们以我们所组合成的一个任意的表达式开始，在这里我们使用一个循环变量组合得到表达式（x + 10）。这后边跟着我们现在能看出来的一个for循环的头部，它指定了循环变量，以及一个可迭代对象（for x in L）。
要运行该表达式，python会在解释器内部执行一个遍历L的迭代，按照顺序把x赋值给每个元素，并且收集对各元素运行左侧表达式的结果。

2. 在文件上使用列表推导

文件对象有一个readlines方法，能够一次性把文件载入成一个行字符串列表。

def read_poem():
    f = open("poem.txt")
    lines = f.readlines()
    print(lines)


if __name__ == "__main__":
    read_poem()

['Early in the day it was whispered that we should sail in a boat.\n', 'The time that my journey takes is long and the way of it long.\n', 'I came out on the chariot of the first gleam of light.']

我们看到每一行的末尾都有一个换行符，我们需要去掉换行符。每当我们需要在一个序列中的每一项进行操作时，就可以考虑使用列表推导。

def read_poem():
    f = open("poem.txt")
    lines = f.readlines()
    lines = [line.rstrip() for line in lines]
    print(lines)


if __name__ == "__main__":
    read_poem()

['Early in the day it was whispered that we should sail in a boat.', 'The time that my journey takes is long and the way of it long.', 'I came out on the chariot of the first gleam of light.']

列表推导和for循环语句一样是一个迭代工具，因此我们甚至不需要提前打开文件，如果我们在表达式中打开它，列表推导将自动采用迭代协议。

def read_poem():
    lines = [line.rstrip() for line in open("poem.txt")]
    print(lines)


if __name__ == "__main__":
    read_poem()

['Early in the day it was whispered that we should sail in a boat.', 'The time that my journey takes is long and the way of it long.', 'I came out on the chariot of the first gleam of light.']

python逐行扫描文件并自动创建了运行结果的列表，因为大部分工作会在python解释器内部完成，所以这可能比等价的for循环语句要快，同时也不会一次性把文件全部载入内存中。再次，特别是对于较大的文件，列表推导的速度优势会更加明显。

3. 扩展的列表推导语法。

(1) 筛选分句：if
作为一个特别有用的扩展，推导表达式中嵌套的for循环可以有一个关联的if分句，来过滤掉那些测试不为真的结果项。
例子：过滤出一个列表中以p开头的字符串。

>>> L = ["smile","perfect","project"]
>>> L = [x.upper() for x in L if x[0] == "p"]
>>> L
['PERFECT', 'PROJECT']

(2) 嵌套循环：for
列表推导允许任意数目的for分句，并且每个for分句都可以带有一个可选的关联的if分句。
例子：拼接字符串，把一个字符串中的每个字符x和另一个字符串中的每个字符y拼接起来。

>>> [x + y for x in "abc" for y in "lmn"]
['al', 'am', 'an', 'bl', 'bm', 'bn', 'cl', 'cm', 'cn']