python进阶:第一章（数据结构与算法）

2017-04-05 本文已影响84人码农小杨

问题一：如何在列表，字典，集合中根据条件筛选数据？

问题内容：
如何找出列表中的负数？
如何筛选出字典中值大于某个数值的项？
如何筛选出集合中能够被3整除的元素？

解决方案：
对于列表可以使用filter函数和列表解析：

>>> from random import randint 
>>> data = [ randint(-10,10) for _ in range(10)] 
>>> data 
[-4, 0, 8, -2, -5, -9, 6, 5, 6, 6]
>>> filter(lambda x: x >=0,data) 
<filter object at 0x7f51e28c0c18>
我们看到返回的不是一个列表，这是python3对filter函数的修改，需要使用list()函数转换。
>>> list(filter(lambda x: x >=0,data)) 
[0, 8, 6, 5, 6, 6]
>>> [ x for x in data if x>= 0] 
[0, 8, 6, 5, 6, 6]

上面的两个方法中，列表解析耗费的时间要比filter()函数的少。

对于字典使用字典解析：

生成一个字典
>>> d = { x: randint(60,100) for x in range(1,21)} 
>>> d
{1: 97, 2: 100, 3: 62, 4: 66, 5: 87, 6: 89, 7: 66, 8: 79, 9: 96, 10: 76, 11: 81, 12: 61, 13: 100, 14: 90, 15: 94, 16: 74,
 17: 80, 18: 76, 19: 81, 20: 82}
查询出所有值大于90的
>>> {k:v for k,v in d.items() if v>90} 
{1: 97, 2: 100, 15: 94, 13: 100, 9: 96}

对于集合使用集合解析：

>>> s = set(data) 
>>> { x for x in s  if x % 3 ==0}
{0, 6, -9}

问题二：如何为元组中的每个元素命名，提高程序的可读性？

问题内容：
学生信息系统中数据为固定格式：
（名字，年龄，性别，邮箱地址）

学生数量很大为了减小存储开销，对每个学生信息用元组表示：
('Wex',24,'female','wex@qq.com')
('Fangyw',23,female,'fangyw@163.com')
('Pandz',25,male,'pandz@qq.com')
......

访问时，我们使用索引（index）访问，大量索引降低程序可读性，如何解决这个问题？

解决方案：
方案一：定义类似与其它语言的枚举类型，也就是定义一系列数值常量。
方案二：使用标准库中collections.namedtuple替代内置tuple

先看使用枚举：

>>> NAME = 0 
>>> AGE = 1 
>>> SEX = 2 
>>> EMAIL = 3 
>>> student = ('Wex',24,'female','wex@qq.com') 
>>> student[NAME] 
'Wex'

变量的赋值也可以使用列表解包
>>> NAME,AGE,SEX,EMAIL=range(4) 
>>> student[NAME] 
'Wex'

我们接着看使用namedtuple：

首先我们看下namedtuple函数的用法：
┌──────────────────────────────────────────────────────────────────────────────────────────────┐
│ namedtuple: (typename, field_names, verbose=False, rename=False)                             │
│ Returns a new subclass of tuple with named fields.                                           │
│                                                                                              │
│ >>> Point = namedtuple('Point', ['x', 'y'])                                                  │
│ >>> Point.__doc__                   # docstring for the new class                            │
│ 'Point(x, y)'                                                                                │
│ >>> p = Point(11, y=22)             # instantiate with positional args or keywords           │
│ >>> p[0] + p[1]                     # indexable like a plain tuple                           │
│ 33                                                                                           │
│ >>> x, y = p                        # unpack like a regular tuple                            │
│ >>> x, y                                                                                     │
│ (11, 22)                                                                                     │
│ >>> p.x + p.y                       # fields also accessible by name                         │
│ 33                                                                                           │
│ >>> d = p._asdict()                 # convert to a dictionary                                │
│ >>> d['x']                                                                                   │
│ 11                                                                                           │
│ >>> Point(**d)                      # convert from a dictionary                              │
│ Point(x=11, y=22)                                                                            │
│ >>> p._replace(x=100)               # _replace() is like str.replace() but targets named fie │
│ lds                                                                                          │
│ Point(x=100, y=22)                                                                           │
└──────────────────────────────────────────────────────────────────────────────────────────────┘
该函数会返回一个内置tuple类的子类。第一个参数是子类的名字，第二个参数是索引的名字列表。

>>> from collections import namedtuple 
>>> namedtuple('Student',['name','age','sex','email']) 
<class '__console__.Student'>
>>> Student = namedtuple('Student',['name','age','sex','email']) 
可以直接使用类创建元组
>>> s1 = Student('Wex',24,'female','wex@qq.com')  
>>> s1 
Student(name='Wex', age=24, sex='female', email='wex@qq.com')
也可以按照属性名创建
>>> s2 = Student(name='Fyw', age=22, sex='female', email='fyw@qq.com')
>>> s2 
Student(name='Fyw', age=22, sex='female', email='fyw@qq.com')
直接通过属性获得值
>>> s1.name 
'Wex'
>>> s2.age 
22
返回的是一个tuple的子类
>>> type(s1) 
<class '__console__.Student'>
>>> isinstance(s1,tuple) 
True

问题三：如何统计列中元素的出现频度？

问题内容：
1，随机序列[12,1,2,3,4,5,4,3,4,5,...]中，找到出现次数最高的3个元素，它的出现次数是多少？
2，对某英文文章的单词进行词频统计，找到出现次数最高的10个单词，它们出现的次数是多少？

对于随机序列：
使用遍历的方法：

>>> from random import randint 
创建一段随机序列
>>> data = [randint(0,20) for _ in range(0,30)] 
根据随机序列的值为字典的键，键值默认为0 
>>> c = dict.fromkeys(data,0) 
遍历随机序列，遍历到的值相应的键值加一
>>> for x in data:
...     c[x] += 1 
...     
... 
>>> c 
{0: 2, 2: 1, 3: 2, 4: 1, 5: 3, 6: 1, 7: 1, 9: 1, 10: 1, 12: 1, 14: 3, 15: 2, 16: 3, 17: 2, 18: 2, 19: 3, 20: 1}

解决方案：
使用collections.Counter对象
将序列传入Counter的构造器，得到Counter对象是元素频度的字典
Counter.most_commom(n)方法得到频度最高的n个元素的列表
首先看下Counter函数的语法：

┌──────────────────────────────────────────────────────────────────────────────────────────────┐
│ Counter: (*args, **kwds)                                                                     │
│ data                                                                                         │
│ Create a new, empty Counter object.  And if given, count elements                            │
│ from an input iterable.  Or, initialize the count from another mapping                       │
│ of elements to their counts.                                                                 │
│                                                                                              │
│ >>> c = Counter()                           # a new, empty counter                           │
│ >>> c = Counter('gallahad')                 # a new counter from an iterable                 │
│ >>> c = Counter({'a': 4, 'b': 2})           # a new counter from a mapping                   │
│ >>> c = Counter(a=4, b=2)                   # a new counter from keyword args                │
└──────────────────────────────────────────────────────────────────────────────────────────────┘

我们将生成的随机序列传入函数，返回的是一个统计结果的字典

>>> from collections import Counter 
>>> c2 = Counter(data)
>>> c2 
Counter({5: 3, 14: 3, 16: 3, 19: 3, 0: 2, 3: 2, 15: 2, 17: 2, 18: 2, 2: 1, 4: 1, 6: 1, 7: 1, 9: 1, 10: 1, 12: 1, 20: 1})
>>> c2[10]
1
>>> c2[5]
3
>>> c2.most_common(3) 
[(5, 3), (14, 3), (16, 3)]

现在我们对一个文件进行词频统计：

>>> import re 
将文件读入为一个字符串
>>> txt = open('vimrc.txt').read()
使用非字符串对文件进行分割
>>> re.split('\W+',txt)
使用Counter函数对列表进行统计
>>> c3 = Counter(re.split('\W+',txt))
使用most_common统计前十频度
>>> c3.most_common(10)

问题四：如何根据字典中值的大小，对字典中的项排序

问题内容：
某班英语成绩以字典形式存储为：
{'wex':98,'fyw':97,'xyx':99...}
根据成绩高低，对学生排名

解决方案：
使用内置函数sorted
1，利用zip将字典数据转化元组
2，传递sorted函数的key参数

方法一：使用zip转换为元组进行排序

In [1]:  from random import  randint   

In [2]: d = {x: randint(60,100) for x in 'xyzabc'}  
对字典进行排序，只有键排序
In [3]: sorted(d)  
Out[3]: ['a', 'b', 'c', 'x', 'y', 'z']
我们看对元组进行排序原理，先对第一个元素进行排序
In [4]: (97,'a') > (96,'b') 
Out[4]: True
第一个相同，再多第二个进行排序
In [5]: (97,'a') > (97,'b') 
Out[5]: False

In [6]: d.keys()  
Out[6]: dict_keys(['x', 'b', 'z', 'y', 'c', 'a'])

In [7]: d.values()   
Out[7]: dict_values([88, 76, 96, 78, 62, 68])

In [8]: zip(d.values(),d.keys())  
Out[8]: <zip at 0x7f3aac585f48>

In [9]: type(d.keys()) 
Out[9]: dict_keys
在python3中，zip返回的值需要使用list转换
In [10]: list(zip(d.values(),d.keys())) 
Out[10]: [(88, 'x'), (76, 'b'), (96, 'z'), (78, 'y'), (62, 'c'), (68, 'a')]

In [11]: sorted(list(zip(d.values(),d.keys()))) 
Out[11]: [(62, 'c'), (68, 'a'), (76, 'b'), (78, 'y'), (88, 'x'), (96, 'z')]

方法二：传递sorted函数的key参数

当我们使用items()函数时，返回的元素列表的第一项并不是值
In [12]: d.items() 
Out[12]: dict_items([('x', 88), ('b', 76), ('z', 96), ('y', 78), ('c', 62), ('a', 68)])
我们可以指定sorted的排序规则，下面函数中的key代表排序的元素，x为前面的元素列表，lambda函数返回的为字典的值
In [13]: sorted(d.items(),key=lambda x: x[1]) 
Out[13]: [('c', 62), ('a', 68), ('b', 76), ('y', 78), ('x', 88), ('z', 96)]

问题五：如何快速找到多个字典中的公共键（key）？

问题内容：
西班牙足球队甲级联赛，每一轮球员进球统计：
第一轮:{'苏亚':1,'梅西':2,'本泽马':1,......}
第二轮:{'苏亚':2,'贝尔':1,'格里':2,......}
第三轮:{'苏亚':1,'托尔':2,'贝尔':2,......}
......
统计出前N轮，每场比赛都有进球的球员。

使用简单的遍历：

>>> from random import  randint,sample 
使用sample选出随机进球的球员
>>> sample('abcdefg',3) 
['b', 'd', 'a']
设置进球球员为3到6人
>>> sample('abcdefg',randint(3,6)) 
['c', 'd', 'f', 'b', 'a', 'g']
使用字典生成式
>>> s1 = { x: randint(1,4) for x in sample('abcdefg',randint(3,6))} 
>>> s1
{'f': 3, 'e': 3, 'c': 1, 'b': 4, 'g': 2}
>>> s2 = { x: randint(1,4) for x in sample('abcdefg',randint(3,6))} 
>>> s3 = { x: randint(1,4) for x in sample('abcdefg',randint(3,6))} 
>>> s2
{'f': 2, 'd': 1, 'c': 3, 'b': 3, 'g': 4}
>>> s3
{'e': 4, 'c': 3, 'a': 4, 'b': 3}
>>> inlist = []
>>> for k in s1:
...     if k in s2 and  k in s3:
...         inlist.append(k) 
...         
...     
>>> inlist 
['c', 'b']

其他的解决方案：
利用集合（set）的交集操作：
步骤一：使用字典的viewkeys()方法，得到一个字典keys的集合（这是python2的做法，python3使用keys()方法获得所有的键）
步骤二：使用map函数，得到所有的字典的keys的集合
步骤三：使用reduce函数，取得所有字典的keys的集合的交集

只有三轮比赛
>>> s1.keys()
dict_keys(['f', 'e', 'c', 'b', 'g'])
>>> s1.keys() & s2.keys() & s3.keys() 
{'c', 'b'}
当有多轮的时候
>>> map(dict.keys,[s1,s2,s3]) 
<map object at 0x7f0993531c18>
使用map函数将字典的键都一次变为list
>>> reduce(lambda a,b: a & b,map(dict.keys,[s1,s2,s3])) 
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    reduce(lambda a,b: a & b,map(dict.keys,[s1,s2,s3]))
NameError: name 'reduce' is not defined
在Python 3里,reduce()函数已经被从全局名字空间里移除了,它现在被放置在fucntools模块里 用的话要 先引
>>> from functools import reduce 
>>> reduce(lambda a,b: a & b,map(dict.keys,[s1,s2,s3]))  
{'c', 'b'}

问题六：如何让字典保持有序？

问题内容：
编程竞赛系统，对参赛选手编程解题进行计时，选手完成题目后，把该选手解题用时记录到字典中，以便赛后按选手名查询成绩。（答题时间越短，成绩越优）
{'wex':(2,43).'hmm':(5,43),'fyw':(1,23)......}
比赛结束后，需按排名顺序依次打印选手成绩，如何实现?

解决方案：
使用collections.OrderedDict
以OrderedDict替代内置字典Dict，依次将选手成绩存入OrderedDict

from collections import OrderedDict 
from random import  randint
from time import time  
d = OrderedDict()   
players = list('ABCDEFGH') 
start = time() 
for i in range(8): 
    input()
    p = players.pop(randint(0,7-i)) 
    end = time()
    print(i+1,p,end-start) 
    d[p] = (i+1,end-start) 


print("******")
for k in d:
    print(k,d[k])

输出结果

1 G 1.6133077144622803

2 A 1.9757952690124512

3 D 2.3789286613464355

4 F 2.730335235595703

5 B 3.157444715499878

6 H 3.551867961883545

7 C 3.967583179473877

8 E 4.346914291381836
******
G (1, 1.6133077144622803)
A (2, 1.9757952690124512)
D (3, 2.3789286613464355)
F (4, 2.730335235595703)
B (5, 3.157444715499878)
H (6, 3.551867961883545)
C (7, 3.967583179473877)
E (8, 4.346914291381836)

问题七：如何实现用户的历史纪录功能（最多n条）？

问题内容：
现在我们制作了一个简单的猜数字的小游戏，添加历史记录功能，显示用户最近猜过的数字，如何实现？

解决方案：
使用容量为n的队列存储历史纪录
使用标准库collectios中的deque，它是一个双端循环队列
程序退出前，可以使用pickle将队列对象存入文件，再次运行程序时将其导入。

>>> from collections import deque
使用deque创建定长的队列
>>> q = deque([],5) 
>>> q.append(1)
>>> q.append(2)
>>> q.append(3)
>>> q.append(4)
>>> q.append(5)
>>> q
deque([1, 2, 3, 4, 5], maxlen=5)
>>> q.append(6)
>>> q 
deque([2, 3, 4, 5, 6], maxlen=5)

游戏文本：

from collections import deque 
from random import randint 

N = randint(0,100)
history = deque([],5) 

def guess(k):
    if k == N:
        print('right')
    if k < N:
        print("%s is less-than N" % k)
    else:
        print("%s is greater-than N" % k) 
    return False 

while True:
    line = input("please input a number:")
    if line.isdigit():
        k = int(line)
        history.append(k)
        if guess(k):
            break
    elif line == "history":
        print(list(history))

上面的程序能够实现游戏的功能，并保存五次输入。
使用pickle模块的dump()和load()函数可以将数据保存到文件和从文件中读取数据

>>> import pickle
>>> p = ['ew','ferf'] 
>>> pickle.dump(p,open('historyy','wb'))
>>> q2 = pickle.load(open('historyy','rb'))
>>> q2
['ew', 'ferf']