[第6周]组合数据类型-Python语言程序设计(学习笔记)
文章原创,最近更新:2018-04-19
1.辅学内容
2.集合类型及操作
3.序列类型及操作
4.实例9:基本统计值计算
5.字典类型及操作
6.模块5:jieba库的使用
7.实例10:文本词频统计
8.所有代码汇总
原链接 语言程序设计北京理工大学
1.辅学内容
1.1前课复习
data:image/s3,"s3://crabby-images/3d21c/3d21c4c77fb7a5d4ebc1ebd4dbd7a120a46233b9" alt=""
data:image/s3,"s3://crabby-images/0122c/0122cdc6092ab7b925d508a1fbf06c3642a6ea51" alt=""
data:image/s3,"s3://crabby-images/51c39/51c3961faf68b431229299f12d523967692ebc0d" alt=""
data:image/s3,"s3://crabby-images/4c49f/4c49f5de16fe09431f99af858da1ed275267f86e" alt=""
data:image/s3,"s3://crabby-images/3977d/3977d0b4961ebae0878d76e80d384393535c58e2" alt=""
data:image/s3,"s3://crabby-images/7a688/7a688379f17ea763406bf8913d08221ac695cb9f" alt=""
1.2本课概要
data:image/s3,"s3://crabby-images/621c6/621c64e6bd7f5424c5d0a0ec1b6ee0be14286c92" alt=""
data:image/s3,"s3://crabby-images/b56a2/b56a20a0ebbef522da7494ef681da53bcffa79ae" alt=""
data:image/s3,"s3://crabby-images/239e9/239e9f85442b6ed28f06113e6f46a6b24c841396" alt=""
1.3练习与作业
data:image/s3,"s3://crabby-images/d7f40/d7f40ff31ce9b15e9b73660810b115f2e992c4ba" alt=""
2.集合类型及操作
2.1单元开篇
data:image/s3,"s3://crabby-images/c2ab6/c2ab67a87a49bef9693e80ac0f0c41a6e0d8665f" alt=""
2.2集合类型定义
data:image/s3,"s3://crabby-images/79725/79725e1e4e0db11c0587b31a8f61b209e603f310" alt=""
不可改变的意思是:这个数据放到集合中,是不可修改的.比如列表类型,是可以被修改的数据类型,一旦这个数据类型放到集合中,那么集合就可能出错.因为集合要求元素是独一无二的,如果元素被改变,可能会与其他元素相同.这样集合类型就出现错误.比如整数/元组等都是集合.
data:image/s3,"s3://crabby-images/56656/5665612a16908edf3f4f74581384a0eee1ecf578" alt=""
data:image/s3,"s3://crabby-images/3fdb0/3fdb0786c7e6a1bb447920022f73065decad3f0e" alt=""
关于集合的重点内容:
- 集合用大括号{}表示,元素间用逗号分隔
- 集合中每个元素唯一,不存在相同元素
- 集合元素之间无序
2.3集合操作符
data:image/s3,"s3://crabby-images/19632/19632685ddbe484ef24539de5bc49251d704f91c" alt=""
data:image/s3,"s3://crabby-images/05a5f/05a5f90fceb789676944128ae07782a3d6eda7e4" alt=""
data:image/s3,"s3://crabby-images/9240c/9240cd6efb9cfe015ec81b58d6040a3ad7cbb122" alt=""
data:image/s3,"s3://crabby-images/7eabb/7eabb4a97258b34ac5cfbd3516eb55be52e975b7" alt=""
2.4集合处理方法
data:image/s3,"s3://crabby-images/708f1/708f1a80a1668b1df43567954cd313014633f4c5" alt=""
如果一个程序出现了异常可以用try-except用这种异常处理办法捕捉这种异常.
data:image/s3,"s3://crabby-images/d5a93/d5a93e172adf563fd93071f55bc022f36d9fbc6a" alt=""
- 因为遍历for in的时候,集合是无序的,因此输出的结果也是顺序可能与你定义的顺序不一样.
-
while Ture是指程序不断的运行.这段代码是从A中不断取出元素打印出来,当A的元素为空时,程序退出.与for in 的循环方式达到了同样的效果.但是用了不同的表达.
2.5集合类型应用场景
data:image/s3,"s3://crabby-images/2dfe7/2dfe74fe310b4f0b529316c568f99b2d753b550b" alt=""
>>> "p" in {"p","y",123}
True
>>> {"p","y"} >= {"p","y",123}
False
>>>
data:image/s3,"s3://crabby-images/0fa72/0fa723e91f4d020253509b24a9164e8a9987a1d6" alt=""
这个是集合最重要的应用场景.
2.6单元小结
data:image/s3,"s3://crabby-images/a92ae/a92ae08e4efca608093b0e1519dd145192d5c6b2" alt=""
3.序列类型及操作
3.1单元开篇
data:image/s3,"s3://crabby-images/59718/5971829fa72b22d7157de4d88c6174bd5e247d2b" alt=""
学号序列类型能够处理很多场景.
3.2序列类型定义
data:image/s3,"s3://crabby-images/d4f07/d4f07651ceefb79de47c8447bb841139e45008de" alt=""
data:image/s3,"s3://crabby-images/d928a/d928acbc11ae6b9e38a607cf8c616dd66e8bfdfc" alt=""
data:image/s3,"s3://crabby-images/38810/3881085c6b5f659778605bef4768c9ed5410a514" alt=""
3.3序列处理函数及方法
data:image/s3,"s3://crabby-images/b95e1/b95e149b903503918632b9893bfb0c7442ccde68" alt=""
data:image/s3,"s3://crabby-images/3f51b/3f51b68be0f5c931e80800ece002d70d4267ac06" alt=""
data:image/s3,"s3://crabby-images/5964f/5964f1b98eeefc2307c8788ec2001c04187c8375" alt=""
data:image/s3,"s3://crabby-images/0d552/0d55248fd7403f55b1b1c289dd62e727f39af476" alt=""
按字母的顺序进行比较.
3.4元组类型及操作
data:image/s3,"s3://crabby-images/48061/48061af79acc4d71b235d4afc15f935fbe18ca02" alt=""
data:image/s3,"s3://crabby-images/8de11/8de11b5923ba1eb929578df9d64e3669db02d04c" alt=""
data:image/s3,"s3://crabby-images/e0c90/e0c90fadd6183a3b982347a33b04861cf44d1831" alt=""
data:image/s3,"s3://crabby-images/0fb26/0fb262034b7749205b992f132e7fe5e218a87c18" alt=""
3.5列表类型及操作
data:image/s3,"s3://crabby-images/7f767/7f7676edc77d5307aef10317d3b33c52ab7ee1ed" alt=""
data:image/s3,"s3://crabby-images/5674e/5674efd17de66a79b1c8be0f117678b868970c57" alt=""
没有使用[]或list(),其他方式相当于列表只是用了不同的名称.比如=,也是如此.
data:image/s3,"s3://crabby-images/07def/07def5c021ad5cbc2a91f01caf1d8feeae82f39e" alt=""
data:image/s3,"s3://crabby-images/a6606/a660627618d5057e3168fe87e25682f6a06a7bf2" alt=""
data:image/s3,"s3://crabby-images/fd9fa/fd9fabcd7d19bf67650f6bd3cc17abb66e796d71" alt=""
一般是增删改查等方式.
data:image/s3,"s3://crabby-images/e42c7/e42c78f25d61d0cf8274baa7ac06f456c6a0d0bf" alt=""
拿出一张纸,进行默写
data:image/s3,"s3://crabby-images/c8d02/c8d0291f798c61476bd8f54776cd9e9e3cbfc52b" alt=""
data:image/s3,"s3://crabby-images/801e9/801e9aef9be2ce9b352fc3fbc9d1d8c428d3adf2" alt=""
data:image/s3,"s3://crabby-images/b7dea/b7dea2daf5a7e519e41c993b2ca3b41041b7af39" alt=""
全部掌握,列表问题应该没啥问题.
3.6序列类型应用场景
data:image/s3,"s3://crabby-images/ad47d/ad47d5f964215973aadcdfa68aac7c0880ef95a2" alt=""
data:image/s3,"s3://crabby-images/bbf39/bbf39ae9149368d1d9adb52b88a3784508891466" alt=""
data:image/s3,"s3://crabby-images/4d443/4d443b7de58d91e2104a32c1fbaaf23e8b43d35c" alt=""
3.7单元小结
data:image/s3,"s3://crabby-images/32825/32825d5610edbc0d81883d7676c1a913f46b2b73" alt=""
4.实例9:基本统计值计算
4.1”基本统计值计算“问题分析
data:image/s3,"s3://crabby-images/98e5d/98e5dc311453f40b37a394aa7cca362c8473c57d" alt=""
data:image/s3,"s3://crabby-images/70f91/70f91aec4a9294b5070376b0191a4bc908c58d2f" alt=""
4.2”基本统计值计算“实例讲解
data:image/s3,"s3://crabby-images/ca25f/ca25f0d0161d80b2a232cc80b9e6751640dd24c4" alt=""
data:image/s3,"s3://crabby-images/95b83/95b83ee9941b41432bcfccdc278e05152f92b1cf" alt=""
def getNum():
nums =[]
iNumStr=input("请输入数字(回车退出):")
while iNumStr != "":
nums.append(eval(iNumStr))
iNumStr=input("请输入数字(回车退出):")
return nums
def mean(numbers):
s=0.0
for num in numbers:
s = s + num
return s/len(numbers)
def dev(numbers,mean):
sdev=0.0
for num in numbers:
sdev=sdev+(num-mean)**2
return pow(sdev/(len(numbers)-1),0.5)
def median(numbers):
sorted(numbers)
size=len(numbers)
if size % 2==0:
med =(numbers[size//2-1]+numbers[size//2])/2
else:
med=numbers[size//2]
return med
n=getNum()
m=mean(n)
print("平均值:{},方差{:.2},中位数:{}.".format(m,dev(n,m),median(n)))
4.3”基本统计值计算“举一反三
data:image/s3,"s3://crabby-images/9b1ed/9b1ed5bc01b1143ee05a580d07d0f72306d6e3b4" alt=""
data:image/s3,"s3://crabby-images/6e5ae/6e5aef655863e309db85067167b7cf336b66dafb" alt=""
此处截图有误,应该是充分利用python提供的内置函数.
5.字典类型及操作
5.1单元开篇
data:image/s3,"s3://crabby-images/665c4/665c4a069378b61ba367522fd0d77334074a438c" alt=""
5.2字典类型定义
data:image/s3,"s3://crabby-images/81dbe/81dbe5ebde6cd436e9d0d67f74908f65e89ff2f9" alt=""
data:image/s3,"s3://crabby-images/391fd/391fd04234b0e42b5ef17152a8e0d64c0422b016" alt=""
data:image/s3,"s3://crabby-images/c4bc4/c4bc436370ef317974eff629782f442390b23e8b" alt=""
data:image/s3,"s3://crabby-images/8e549/8e549d5d023ca4a532a7a252be26a0397dbf6d3a" alt=""
data:image/s3,"s3://crabby-images/70692/7069221a242774ae5382822eaf25491dad13d2d1" alt=""
data:image/s3,"s3://crabby-images/3b6ec/3b6ec5481e4185a7593b98b1dda9c04cb5dda3f0" alt=""
集合生成空的集合,不能用{}生成,因为{}是默认生成字典的.因为字典类型在计算机编程中非常常用.所以把空的{}生成,给大字典使用.如果使用空的集合,可以使用set()函数进行生成.
5.3字典处理函数及方法
data:image/s3,"s3://crabby-images/a8af4/a8af4aa22c84abbe736ea68aa4f761398191497e" alt=""
这里的k是指索引,不是数据值.
data:image/s3,"s3://crabby-images/d551f/d551fab027b2080654d0c8a6ce9e03226cfa3828" alt=""
d.keys()跟d.values()里面的元素如需遍历,需要用for in 的方式进行遍历.
data:image/s3,"s3://crabby-images/c7f13/c7f134fed9a6be766f3fe3b13b331c5d755d3c46" alt=""
d.get()这个函数非常重要,在后续的例子中会使用它
d.pop相当于在字典取出值后,将字典的键值对进行删除.
data:image/s3,"s3://crabby-images/ef1a6/ef1a6bdf115ce53d29dcf3743e4c51031e8e4329" alt=""
data:image/s3,"s3://crabby-images/fa611/fa6116738850c1800b61595ac197df941fa60921" alt=""
用一张纸默写并进行实现
需要注意,字典元素之间是没有顺序.
字典的大部分功能都在这里.
5.4字典类型应用场景
data:image/s3,"s3://crabby-images/a1509/a1509c8325010963843644e538e0497882e916c6" alt=""
data:image/s3,"s3://crabby-images/948ca/948ca0586910600be946c76d5bf35039d1807bd7" alt=""
由键k搜引对应相应的值.
5.5单元小结
data:image/s3,"s3://crabby-images/fa978/fa9784fb27968ef990c828f22bc74d96b209aaad" alt=""
6.模块5:jieba库的使用
6.1jieba库基本介绍
data:image/s3,"s3://crabby-images/febb1/febb16d6f4c402f7b22f9a6fda8e9a35a476cada" alt=""
data:image/s3,"s3://crabby-images/b4ae3/b4ae3f3c9e33988dc2ee7e2352d2fabff249a82a" alt=""
data:image/s3,"s3://crabby-images/15046/15046022f68df5f76de3190e3d32d14e62b88744" alt=""
6.2jieba库使用说明
data:image/s3,"s3://crabby-images/63335/633350fe7d096f05a2c5ad6238f658e7e149402a" alt=""
精确模式是最常用的模式.
搜索引擎模式在特殊的场合用的比较多.
data:image/s3,"s3://crabby-images/cc877/cc8771b68ac9270f969a6fbc66a0a240a4cd1514" alt=""
>>> jieba.lcut("中国是一个伟大的国家")
['中国', '是', '一个', '伟大', '的', '国家']
>>> jieba.lcut("中国是一个伟大的国家",cut_all=True)
['中国', '国是', '一个', '伟大', '的', '国家']
国是是个冗余.
data:image/s3,"s3://crabby-images/e3150/e3150156b25c4f54df15f2a848cbe302746c1e35" alt=""
>>> jieba.lcut_for_search("中华人民共和国是伟大的")
['中华', '华人', '人民', '共和', '共和国', '中华人民共和国', '是', '伟大', '的']
data:image/s3,"s3://crabby-images/08341/08341f4615159cc5b0e6c62518689c24abb4c053" alt=""
7.实例10:文本词频统计
7.1”文本词频统计“问题分析
data:image/s3,"s3://crabby-images/f7075/f70758fd640ee279bdba7468caf2ffcc047e691e" alt=""
data:image/s3,"s3://crabby-images/0f3b0/0f3b0a6964d0f54f9bf7a096a1432b34a32bed91" alt=""
英文文本:Hamet 分析词频
https://python123.io/resources/pye/hamlet.txt
中文文本:《三国演义》分析人物
https://python123.io/resources/pye/threekingdoms.txt
7.2”Hamlet英文词频统计“实例讲解
data:image/s3,"s3://crabby-images/32e56/32e563eef5e681511bf0be38aa701394c9a42263" alt=""
有些单词有些大小写,单词之间会用空格区分。还会有逗号(,)、冒号(:)等多种符号的使用。所以对文本进行处理。规划提取每一个单词作为第一步骤。在这个基础上才有可能统计每一个单词出现的词频数量。
def getText():
txt=open("hamlet.txt","r").read()
txt=txt.lower()
for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_‘{|}~':
txt =txt.replace(ch," ")
return txt
hamletTxt=getText()
words=hamletTxt.split()
counts={}#对每个单词以及出现的次数进行映射
for word in words:
counts[word]=counts.get(word,0)+1
items=list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)#这个列表中的sort,True排序从小到大.
for i in range (10):#前10位的单词
word,count=items[i]
print("{0:<10}{1:>5}".format(word,count))
data:image/s3,"s3://crabby-images/f2134/f21348f3b8081ce92aab558d5a11749e74cf1c52" alt=""
7.3”《三国演义》人物出场统计“实例讲解(上)
data:image/s3,"s3://crabby-images/edd01/edd01423195d003831a98f4288f2feabbf8512ec" alt=""
data:image/s3,"s3://crabby-images/04da5/04da5a57885d648bdc304095e26ccdee3a56a320" alt=""
7.4”《三国演义》人物出场统计“实例讲解(下)
data:image/s3,"s3://crabby-images/ef9c5/ef9c5cdd51da086fd6ecc3154b466c483e096cba" alt=""
data:image/s3,"s3://crabby-images/16223/16223700e1753e5c5e571d3dcd911fa8cee57587" alt=""
data:image/s3,"s3://crabby-images/dfa53/dfa531589bd9a715e424d37b786c0d6f9ebf2ecc" alt=""
import jieba
txt = open("threekingdoms.txt", "r", encoding='utf-8').read()
excludes={"将军","却说","荆州","二人","不可","不能","如此"}
words = jieba.lcut(txt)
counts = {}
for word in words:
if len(word) == 1:
continue
elif word=="诸葛亮" or word=="孔明曰":
rword="孔明"
elif word=="关公" or word=="云长":
rword="关羽"
elif word=="玄德" or word=="玄德曰":
rword="刘备"
elif word=="孟德" or word=="丞相":
rword="曹操"
else:
rword=word
counts[rword] = counts.get(rword,0) + 1
for word in excludes:
del counts[word]
items = list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)
for i in range(10):
word, count = items[i]
print ("{0:<10}{1:>5}".format(word, count))
7.5”文本词频统计“举一反三
data:image/s3,"s3://crabby-images/5b7a5/5b7a51fa96f238fdd529b796404a08c612e168ef" alt=""
data:image/s3,"s3://crabby-images/341e9/341e942ec7f122f5cc940635d907faca35d2376d" alt=""
8.所有代码汇总
#CalStatisticsV1.py
def getNum(): #获取用户不定长度的输入
nums = []
iNumStr = input("请输入数字(回车退出): ")
while iNumStr != "":
nums.append(eval(iNumStr))
iNumStr = input("请输入数字(回车退出): ")
return nums
def mean(numbers): #计算平均值
s = 0.0
for num in numbers:
s = s + num
return s / len(numbers)
def dev(numbers, mean): #计算方差
sdev = 0.0
for num in numbers:
sdev = sdev + (num - mean)**2
return pow(sdev / (len(numbers)-1), 0.5)
def median(numbers): #计算中位数
sorted(numbers)
size = len(numbers)
if size % 2 == 0:
med = (numbers[size//2-1] + numbers[size//2])/2
else:
med = numbers[size//2]
return med
n = getNum() #主体函数
m = mean(n)
print("平均值:{},方差:{:.2},中位数:{}.".format(m, dev(n,m),median(n)))
Hamlet词频统计(含Hamlet原文文本)
#CalHamletV1.py
def getText():
txt = open("hamlet.txt", "r").read()
txt = txt.lower()
for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_‘{|}~':
txt = txt.replace(ch, " ") #将文本中特殊字符替换为空格
return txt
hamletTxt = getText()
words = hamletTxt.split()
counts = {}
for word in words:
counts[word] = counts.get(word,0) + 1
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True)
for i in range(10):
word, count = items[i]
print ("{0:<10}{1:>5}".format(word, count))
《三国演义》人物出场统计(上)(含《三国演义》原文文本)
#CalThreeKingdomsV1.py
import jieba
txt = open("threekingdoms.txt", "r", encoding='utf-8').read()
words = jieba.lcut(txt)
counts = {}
for word in words:
if len(word) == 1:
continue
else:
counts[word] = counts.get(word,0) + 1
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True)
for i in range(15):
word, count = items[i]
print ("{0:<10}{1:>5}".format(word, count))
《三国演义》人物出场统计(下)(含《三国演义》原文文本)
#CalThreeKingdomsV2.py
import jieba
excludes = {"将军","却说","荆州","二人","不可","不能","如此"}
txt = open("threekingdoms.txt", "r", encoding='utf-8').read()
words = jieba.lcut(txt)
counts = {}
for word in words:
if len(word) == 1:
continue
elif word == "诸葛亮" or word == "孔明曰":
rword = "孔明"
elif word == "关公" or word == "云长":
rword = "关羽"
elif word == "玄德" or word == "玄德曰":
rword = "刘备"
elif word == "孟德" or word == "丞相":
rword = "曹操"
else:
rword = word
counts[rword] = counts.get(rword,0) + 1
for word in excludes:
del counts[word]
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True)
for i in range(10):
word, count = items[i]
print ("{0:<10}{1:>5}".format(word, count))