Python七号

Python 那些鲜为人知的故事

2019-03-24  本文已影响60人  somenzz

Python, 是一个设计优美的解释型高级语言, 它提供了很多能让程序员感到舒适的功能特性. 但有的时候, Python 的一些输出结果对于初学者来说似乎并不是那么一目了然.

这个有趣的项目意在收集 Python 中那些难以理解和反人类直觉的例子以及鲜为人知的功能特性, 并尝试讨论这些现象背后真正的原理!

虽然下面的有些例子并不一定会让你觉得 WTFs, 但它们依然有可能会告诉你一些你所不知道的 Python 有趣特性. 我觉得这是一种学习编程语言内部原理的好办法, 而且我相信你也会从中获得乐趣!

如果您是一位经验比较丰富的 Python 程序员, 你可以尝试挑战看是否能一次就找到例子的正确答案. 你可能对其中的一些例子已经比较熟悉了, 那这也许能唤起你当年踩这些坑时的甜蜜回忆.

那么, 让我们开始吧...

注意: 所有的示例都在 Python 3.5.2 版本的交互解释器上测试过, 如果不特别说明应该适用于所有 Python 版本.

我个人建议, 最好依次阅读下面的示例, 并对每个示例:

> Strings can be tricky sometimes/微妙的字符串 *

1.

>>> a = "some_string"
>>> id(a)
140420665652016
>>> id("some" + "_" + "string") # 注意两个的id值是相同的.
140420665652016

2.

>>> a = "wtf"
>>> b = "wtf"
>>> a is b
True

>>> a = "wtf!"
>>> b = "wtf!"
>>> a is b
False

>>> a, b = "wtf!", "wtf!"
>>> a is b # 仅适用于3.7版本以下, 3.7以后的返回结果为False.
True

3.

>>> 'a' * 20 is 'aaaaaaaaaaaaaaaaaaaa'
True
>>> 'a' * 21 is 'aaaaaaaaaaaaaaaaaaaaa'
False

很好理解, 对吧?

💡 说明:


> Time for some hash brownies!/是时候来点蛋糕了!

1.

some_dict = {}
some_dict[5.5] = "Ruby"
some_dict[5.0] = "JavaScript"
some_dict[5] = "Python"

Output:

>>> some_dict[5.5]
"Ruby"
>>> some_dict[5.0]
"Python"
>>> some_dict[5]
"Python"

"Python" 消除了 "JavaScript" 的存在?

💡 说明:

  >>> 5 == 5.0
  True
  >>> hash(5) == hash(5.0)
  True

注意: 具有不同值的对象也可能具有相同的哈希值(哈希冲突).


> Return return everywhere!/到处返回!

def some_func():
    try:
        return 'from_try'
    finally:
        return 'from_finally'

Output:

>>> some_func()
'from_finally'

💡 说明:


> Deep down, we're all the same./本质上,我们都一样. *

class WTF:
  pass

Output:

>>> WTF() == WTF() # 两个不同的对象应该不相等
False
>>> WTF() is WTF() # 也不相同
False
>>> hash(WTF()) == hash(WTF()) # 哈希值也应该不同
True
>>> id(WTF()) == id(WTF())
True

💡 说明:

  class WTF(object):
    def __init__(self): print("I")
    def __del__(self): print("D")

Output:

  >>> WTF() is WTF()
  I
  I
  D
  D
  False
  >>> id(WTF()) == id(WTF())
  I
  D
  I
  D
  True

正如你所看到的, 对象销毁的顺序是造成所有不同之处的原因.


> For what?/为什么?

some_string = "wtf"
some_dict = {}
for i, some_dict[i] in enumerate(some_string):
    pass

Output:

>>> some_dict # 创建了索引字典.
{0: 'w', 1: 't', 2: 'f'}

💡 说明:

  for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite]

其中 exprlist 指分配目标. 这意味着对可迭代对象中的每一项都会执行类似 {exprlist} = {next_value} 的操作.

一个有趣的例子说明了这一点:

  for i in range(4):
      print(i)
      i = 10

Output:

  0
  1
  2
  3

你可曾觉得这个循环只会运行一次?

💡 说明:

  >>> i, some_dict[i] = (0, 'w')
  >>> i, some_dict[i] = (1, 't')
  >>> i, some_dict[i] = (2, 'f')
  >>> some_dict

> Evaluation time discrepancy/执行时机差异

1.

array = [1, 8, 15]
g = (x for x in array if array.count(x) > 0)
array = [2, 8, 22]

Output:

>>> print(list(g))
[8]

2.

array_1 = [1,2,3,4]
g1 = (x for x in array_1)
array_1 = [1,2,3,4,5]

array_2 = [1,2,3,4]
g2 = (x for x in array_2)
array_2[:] = [1,2,3,4,5]

Output:

>>> print(list(g1))
[1,2,3,4]

>>> print(list(g2))
[1,2,3,4,5]

💡 说明


> is is not what it is!/出人意料的is!

下面是一个在互联网上非常有名的例子.

>>> a = 256
>>> b = 256
>>> a is b
True

>>> a = 257
>>> b = 257
>>> a is b
False

>>> a = 257; b = 257
>>> a is b
True

💡 说明:

is== 的区别

  >>> [] == []
  True
  >>> [] is [] # 这两个空列表位于不同的内存地址.
  False

256 是一个已经存在的对象, 而 257 不是

当你启动Python 的时候, -5256 的数值就已经被分配好了. 这些数字因为经常使用所以适合被提前准备好.

引用自 https://docs.python.org/3/c-api/long.html

当前的实现为-5到256之间的所有整数保留一个整数对象数组, 当你创建了一个该范围内的整数时, 你只需要返回现有对象的引用. 所以改变1的值是有可能的. 我怀疑这种行为在Python中是未定义行为. :-)

>>> id(256)
10922528
>>> a = 256
>>> b = 256
>>> id(a)
10922528
>>> id(b)
10922528
>>> id(257)
140084850247312
>>> x = 257
>>> y = 257
>>> id(x)
140084850247440
>>> id(y)
140084850247344

这里解释器并没有智能到能在执行 y = 257 时意识到我们已经创建了一个整数 257, 所以它在内存中又新建了另一个对象.

ab 在同一行中使用相同的值初始化时,会指向同一个对象.

>>> a, b = 257, 257
>>> id(a)
140640774013296
>>> id(b)
140640774013296
>>> a = 257
>>> b = 257
>>> id(a)
140640774013392
>>> id(b)
140640774013488

> A tic-tac-toe where X wins in the first attempt!/一蹴即至!

# 我们先初始化一个变量row
row = [""]*3 #row i['', '', '']
# 并创建一个变量board
board = [row]*3

Output:

>>> board
[['', '', ''], ['', '', ''], ['', '', '']]
>>> board[0]
['', '', '']
>>> board[0][0]
''
>>> board[0][0] = "X"
>>> board
[['X', '', ''], ['X', '', ''], ['X', '', '']]

我们有没有赋值过3个 "X" 呢?

💡 说明:

当我们初始化 row 变量时, 下面这张图展示了内存中的情况。

而当通过对 row 做乘法来初始化 board 时, 内存中的情况则如下图所示 (每个元素 board[0], board[1]board[2] 都和 row 一样引用了同一列表.)

我们可以通过不使用变量 row 生成 board 来避免这种情况. (这个issue提出了这个需求.)

>>> board = [['']*3 for _ in range(3)]
>>> board[0][0] = "X"
>>> board
[['X', '', ''], ['', '', ''], ['', '', '']]

> The sticky output function/麻烦的输出

funcs = []
results = []
for x in range(7):
    def some_func():
        return x
    funcs.append(some_func)
    results.append(some_func()) # 注意这里函数被执行了

funcs_results = [func() for func in funcs]

Output:

>>> results
[0, 1, 2, 3, 4, 5, 6]
>>> funcs_results
[6, 6, 6, 6, 6, 6, 6]

即使每次在迭代中将 some_func 加入 funcs 前的 x 值都不相同, 所有的函数还是都返回6.

// 再换个例子

>>> powers_of_x = [lambda x: x**i for i in range(10)]
>>> [f(2) for f in powers_of_x]
[512, 512, 512, 512, 512, 512, 512, 512, 512, 512]

💡 说明:

    funcs = []
    for x in range(7):
        def some_func(x=x):
            return x
        funcs.append(some_func)
**Output:**
    >>> funcs_results = [func() for func in funcs]
    >>> funcs_results
    [0, 1, 2, 3, 4, 5, 6]

> is not ... is not is (not ...)/is not ... 不是 is (not ...)

>>> 'something' is not None
True
>>> 'something' is (not None)
False

💡 说明:


> The surprising comma/意外的逗号

Output:

>>> def f(x, y,):
...     print(x, y)
...
>>> def g(x=4, y=5,):
...     print(x, y)
...
>>> def h(x, **kwargs,):
  File "<stdin>", line 1
    def h(x, **kwargs,):
                     ^
SyntaxError: invalid syntax
>>> def h(*args,):
  File "<stdin>", line 1
    def h(*args,):
                ^
SyntaxError: invalid syntax

💡 说明:


> Backslashes at the end of string/字符串末尾的反斜杠

Output:

>>> print("\\ C:\\")
\ C:\
>>> print(r"\ C:")
\ C:
>>> print(r"\ C:\")

    File "<stdin>", line 1
      print(r"\ C:\")
                     ^
SyntaxError: EOL while scanning string literal

💡 说明:

  >>> print(repr(r"wt\"f"))
  'wt\\"f'

> not knot!/别纠结!

x = True
y = False

Output:

>>> not x == y
True
>>> x == not y
  File "<input>", line 1
    x == not y
           ^
SyntaxError: invalid syntax

💡 说明:


> Half triple-quoted strings/三个引号

Output:

>>> print('wtfpython''')
wtfpython
>>> print("wtfpython""")
wtfpython
>>> # 下面的语句会抛出 `SyntaxError` 异常
>>> # print('''wtfpython')
>>> # print("""wtfpython")

💡 说明:

  >>> print("wtf" "python")
  wtfpython
  >>> print("wtf" "") # or "wtf"""
  wtf

> Midnight time doesn't exist?/不存在的午夜?

from datetime import datetime

midnight = datetime(2018, 1, 1, 0, 0)
midnight_time = midnight.time()

noon = datetime(2018, 1, 1, 12, 0)
noon_time = noon.time()

if midnight_time:
    print("Time at midnight is", midnight_time)

if noon_time:
    print("Time at noon is", noon_time)

Output:

('Time at noon is', datetime.time(12, 0))

midnight_time 并没有被输出.

💡 说明:

在Python 3.5之前, 如果 datetime.time 对象存储的UTC的午夜时间(译: 就是 00:00), 那么它的布尔值会被认为是 False. 当使用 if obj: 语句来检查 obj 是否为 null 或者某些“空”值的时候, 很容易出错.


> What's wrong with booleans?/布尔你咋了?

1.

# 一个简单的例子, 统计下面可迭代对象中的布尔型值的个数和整型值的个数
mixed_list = [False, 1.0, "some_string", 3, True, [], False]
integers_found_so_far = 0
booleans_found_so_far = 0

for item in mixed_list:
    if isinstance(item, int):
        integers_found_so_far += 1
    elif isinstance(item, bool):
        booleans_found_so_far += 1

Output:

>>> integers_found_so_far
4
>>> booleans_found_so_far
0

2.

another_dict = {}
another_dict[True] = "JavaScript"
another_dict[1] = "Ruby"
another_dict[1.0] = "Python"

Output:

>>> another_dict[True]
"Python"

3.

>>> some_bool = True
>>> "wtf"*some_bool
'wtf'
>>> some_bool = False
>>> "wtf"*some_bool
''

💡 说明:

  >>> isinstance(True, int)
  True
  >>> isinstance(False, int)
  True
  >>> True == 1 == 1.0 and False == 0 == 0.0
  True

> Class attributes and instance attributes/类属性和实例属性

1.

class A:
    x = 1

class B(A):
    pass

class C(A):
    pass

Output:

>>> A.x, B.x, C.x
(1, 1, 1)
>>> B.x = 2
>>> A.x, B.x, C.x
(1, 2, 1)
>>> A.x = 3
>>> A.x, B.x, C.x
(3, 2, 3)
>>> a = A()
>>> a.x, A.x
(3, 3)
>>> a.x += 1
>>> a.x, A.x
(4, 3)

2.

class SomeClass:
    some_var = 15
    some_list = [5]
    another_list = [5]
    def __init__(self, x):
        self.some_var = x + 1
        self.some_list = self.some_list + [x]
        self.another_list += [x]

Output:

>>> some_obj = SomeClass(420)
>>> some_obj.some_list
[5, 420]
>>> some_obj.another_list
[5, 420]
>>> another_obj = SomeClass(111)
>>> another_obj.some_list
[5, 111]
>>> another_obj.another_list
[5, 420, 111]
>>> another_obj.another_list is SomeClass.another_list
True
>>> another_obj.another_list is some_obj.another_list
True

💡 说明:


> yielding None/生成 None

some_iterable = ('a', 'b')

def some_func(val):
    return "something"

Output:

>>> [x for x in some_iterable]
['a', 'b']
>>> [(yield x) for x in some_iterable]
<generator object <listcomp> at 0x7f70b0a4ad58>
>>> list([(yield x) for x in some_iterable])
['a', 'b']
>>> list((yield x) for x in some_iterable)
['a', None, 'b', None]
>>> list(some_func((yield x)) for x in some_iterable)
['a', 'something', 'b', 'something']

💡 说明:


> Mutating the immutable!/强人所难

some_tuple = ("A", "tuple", "with", "values")
another_tuple = ([1, 2], [3, 4], [5, 6])

Output:

>>> some_tuple[2] = "change this"
TypeError: 'tuple' object does not support item assignment
>>> another_tuple[2].append(1000) # 这里不出现错误
>>> another_tuple
([1, 2], [3, 4], [5, 6, 1000])
>>> another_tuple[2] += [99, 999]
TypeError: 'tuple' object does not support item assignment
>>> another_tuple
([1, 2], [3, 4], [5, 6, 1000, 99, 999])

我还以为元组是不可变的呢...

💡 说明:

(译: 对于不可变对象, 这里指tuple, += 并不是原子操作, 而是 extend= 两个动作, 这里 = 操作虽然会抛出异常, 但 extend 操作已经修改成功了. 详细解释可以看这里)


> The disappearing variable from outer scope/消失的外部变量

e = 7
try:
    raise Exception()
except Exception as e:
    pass

Output (Python 2.x):

>>> print(e)
# prints nothing

Output (Python 3.x):

>>> print(e)
NameError: name 'e' is not defined

💡 说明:

  except E as N:
      foo

会被翻译成

  except E as N:
      try:
          foo
      finally:
          del N

这意味着异常必须在被赋值给其他变量才能在 except 子句之后引用它. 而异常之所以会被清除, 则是由于上面附加的回溯信息(trackback)会和栈帧(stack frame)形成循环引用, 使得该栈帧中的所有本地变量在下一次垃圾回收发生之前都处于活动状态.(译: 也就是说不会被回收)

     def f(x):
         del(x)
         print(x)

     x = 5
     y = [5, 4, 3]
 **Output:**
     >>>f(x)
     UnboundLocalError: local variable 'x' referenced before assignment
     >>>f(y)
     UnboundLocalError: local variable 'x' referenced before assignment
     >>> x
     5
     >>> y
     [5, 4, 3]
    >>> e
    Exception()
    >>> print e
    # 没有打印任何内容!

> When True is actually False/真亦假

True = False
if True == False:
    print("I've lost faith in truth!")

Output:

I've lost faith in truth!

💡 说明:


> From filled to None in one instruction.../从有到无...

some_list = [1, 2, 3]
some_dict = {
  "key_1": 1,
  "key_2": 2,
  "key_3": 3
}

some_list = some_list.append(4)
some_dict = some_dict.update({"key_4": 4})

Output:

>>> print(some_list)
None
>>> print(some_dict)
None

💡 说明:

大多数修改序列/映射对象的方法, 比如 list.append, dict.update, list.sort 等等. 都是原地修改对象并返回 None. 这样做的理由是, 如果操作可以原地完成, 就可以避免创建对象的副本来提高性能. (参考这里)


> Subclass relationships/子类关系 *

Output:

>>> from collections import Hashable
>>> issubclass(list, object)
True
>>> issubclass(object, Hashable)
True
>>> issubclass(list, Hashable)
False

子类关系应该是可传递的, 对吧? (即, 如果 AB 的子类, BC 的子类, 那么 A 应该C 的子类.)

💡 说明:


> The mysterious key type conversion/神秘的键型转换 *

class SomeClass(str):
    pass

some_dict = {'s':42}

Output:

>>> type(list(some_dict.keys())[0])
str
>>> s = SomeClass('s')
>>> some_dict[s] = 40
>>> some_dict # 预期: 两个不同的键值对
{'s': 40}
>>> type(list(some_dict.keys())[0])
str

💡 说明:

  class SomeClass(str):
    def __eq__(self, other):
        return (
            type(self) is SomeClass
            and type(other) is SomeClass
            and super().__eq__(other)
        )

    # 当我们自定义 __eq__ 方法时, Python 不会再自动继承 __hash__ 方法
    # 所以我们也需要定义它
    __hash__ = str.__hash__

  some_dict = {'s':42}

Output:

  >>> s = SomeClass('s')
  >>> some_dict[s] = 40
  >>> some_dict
  {'s': 40, 's': 42}
  >>> keys = list(some_dict.keys())
  >>> type(keys[0]), type(keys[1])
  (__main__.SomeClass, str)

> Let's see if you can guess this?/看看你能否猜到这一点?

a, b = a[b] = {}, 5

Output:

>>> a
{5: ({...}, 5)}

💡 说明:

  (target_list "=")+ (expression_list | yield_expression)

赋值语句计算表达式列表(expression list)(牢记 这可以是单个表达式或以逗号分隔的列表, 后者返回元组)并将单个结果对象从左到右分配给目标列表中的每一项.

  >>> some_list = some_list[0] = [0]
  >>> some_list
  [[...]]
  >>> some_list[0]
  [[...]]
  >>> some_list is some_list[0]
  True
  >>> some_list[0][0][0][0][0][0] == some_list
  True

我们的例子就是这种情况 (a[b][0]a 是相同的对象)

  a, b = {}, 5
  a[b] = a, b

并且可以通过 a[b][0]a 是相同的对象来证明是循环引用

  >>> a[b][0] is a
  True

休息会儿吧!

(完)

个人微信公众号.png
上一篇下一篇

猜你喜欢

热点阅读