Python正式课第十三天

2019-11-20 本文已影响0人 code与有荣焉

一、文件操作与字符编码

1. 字符编码

编码将字符转换为对应的二进制序列的过程叫做字符编码（字符->二进制01）
解码将二进制序列转换为对应的字符的过程叫做字符解码（二进制01->字符）

2. 字符编码的简单发展历史

ASCII码诞生

它被设计为用1个字节来表示一个字符，所以ASCII码表最多只能表示2**8=256个字符。实际上ASCII码表中只有128个字符，剩余的128个字符是预留扩展用的。

GBK等各国编码诞生

大家发现ASCII码预留的128个位置根本无法存储自己国家的文字和字符，因此各个国家开始制定各自的字符编码表

GBK一个中文相当于2个字节

Unicode诞生（万国码）

Unicode规定所有的字符和符号最少由2个字节（16位）来表示，所以Unicode码可以表示的最少字符个数为2**16=65536。

UTF-8诞生（目前最常用的）

由于Unicode规定每个字符最少占2个字节，美国不乐意了，因为ASCII码英文只占一个字节。

ascii码中的内容用1个字节保存
欧洲的字符用2个字节保存
东亚的字符用3个字节保存
...

一个中文字符相当于3个字节
中文更占内存

3. 字符串和字节序列的转换

encode() 编码
decode() 解码

示例：

字符串转字节序列

bytes = '张三'.encode()
print(bytes)  # b'\xe5\xbc\xa0\xe4\xb8\x89'
print(type(bytes))  # <class 'bytes'>
bytes = '张三'.encode('utf-8')
print(bytes)  # b'\xe5\xbc\xa0\xe4\xb8\x89'
print(type(bytes))  # <class 'bytes'>
bytes = '张三'.encode('gbk')
print(bytes)  # b'\xd5\xc5\xc8\xfd'
print(type(bytes))  # <class 'bytes'>

encode默认按utf-8编码

字节序列转字符串

bytes = b'\xe5\xbc\xa0\xe4\xb8\x89'
msg1 = bytes.decode()
print(msg1)  # 张三
print(type(msg1))  # <class 'str'>

msg1 = bytes.decode('utf-8')
print(msg1)  # 张三
print(type(msg1))  # <class 'str'>

msg1 = bytes.decode('gbk')
print(msg1)  # 寮犱笁
print(type(msg1))  # <class 'str'>

encode()编码后，显示的是十六进制，但实际是2进制的。
1个字节可以表达两个十六进制

4. 文件概念

计算机的 文件，就是存储在硬盘上的 数据
在计算机中，文件是以 二进制 的方式保存在磁盘上的

5. 文件的基本操作

函数/方法	说明
open	打开文件，并且返回文件操作对象
read	将文件内容读取到内存
write	将指定内容写入文件
close	关闭文件

读入内存可以理解为读进一个变量中

read 方法 —— 读取文件

# 1\. 打开 - 文件名需要注意大小写
file = open("demo2.txt",encoding='utf-8')
print(file)
# 2\. 读取
text = file.read()
print(text)

# 3\. 关闭
file.close()

注意：

编码和解码的方式要相同，否则会乱码。
pycharm默认utf-8编码
Python默认是按照gbk(cp936）编码方式解码的
打开文件要记得关闭，忘记关闭文件，会造成系统资源消耗，而且会影响到后续对文件的访问

write 方法 —— 写入文件

# 打开文件
f = open("abc.txt", "w",encoding='utf-8')
print(f)
f.write("hello neuedu！\n")
f.write("今天天气真好")

# 关闭文件
f.close()

open方法默认以读的方式打开

6. read()、readline()、readlines()区别与用法

read([size])方法

read([size])方法从文件当前位置起读取size个字节，若无参数size，则表示读取至文件结束为止，它的返回值为字符串对象

f = open("a.txt")
lines = f.read()
print(lines)
print(type(lines))
f.close()

readline()方法

从字面意思可以看出，该方法每次读出一行内容，所以，读取时占用内存小，比较适合大文件，该方法返回一个字符串对象。

f = open("a.txt")
line = f.readline()
print(type(line))
while line:
 print(line)
 line = f.readline()
f.close()

readlines()方法

读取整个文件所有行，保存在一个列表(list)变量中，每行作为一个元素，但读取大文件会比较占内存。

f = open("a.txt")
lines = f.readlines()
print(type(lines))
for line in lines:
 print(lines)
f.close()

最简单、最快速的逐行处理文本的方法：直接for循环文件对象

f = open("a.txt")
for line in f:
 print(line)
f.close()

7. 使用 with open（） as 读写文件

由于文件读写时都有可能产生IOError，一旦出错，后面的f.close()就不会调用。所以，为了保证无论是否出错都能正确地关闭文件,Python引入了with语句来自动帮我们调用close()方法

8.文件指针

文件指针标记从哪个位置开始读取数据

第一次打开文件时，通常文件指针会指向文件的开始位置
当执行了 read 方法后，文件指针会移动到读取内容的末尾

控制文件指针移动
方法：
f.seek(offset,whence) offset代表文件指针的偏移量，单位是字节bytes whence代表参照物，有三个取值
（1）0：参照文件的开头
（2）1：参照当前文件指针所在的位置
（3）2：参照文件末尾
注意：

其中whence=1和whence=2只能在b 模式下使用
f.tell()函数可以得到当前文件指针的位置
指针是从0开始的，正向读，指针指向第0位置为空，反向读指针指向第0位置为最后一个数

9. 打开文件的方式总结

总结
读写图片等二进制文件用rb、rb+、wb、wb+、ab、ab+

二、文件操作练习

1.文件操作

1). 创建文件data.txt, 文件共100000行, 每行存放一个1～100之间的整数.
2). 找出文件中数字出现次数最多的10个数字，写入文件mostNum.txt;

import random

f = open('data.txt', 'w+')
for i in range(100000):
    f.write(str(random.randint(1,100)) + '\n')
print(f.read())
f.close()

from collections import Counter
dict={}
f = open('data.txt', 'r+')
for i in f:
    if i not in dict:
        dict[i] = 1
    else:
        dict[i] = dict[i] + 1
d = Counter(dict)
with open('mostNum.txt', 'w+') as k:
    for i in d.most_common(10):
        k.write(f'{i[0].strip()}--------{i[1]}\n')
    k.seek(0, 0)
    print(k.read())
f.close()

2. 添加行号

1). 编写程序，将a.txt操作生成文件a_num.txt文件，
2). 其中文件内容与a.txt一致，但是在每行的行首加上行号。

with open('a.txt', 'r+') as f1, open('a_num.txt', 'w+') as f2:
    for i, j in enumerate(f1):
        f2.write(str(i)+'  '+j.strip()+'\n')
    f2.seek(0, 0)
    print(f2.read())

3. 非文本文件的读取

如果读取图片，音乐或者视频(非文本文件), 需要通过二进制的方式进行读取与写入;b

读取二进制文件 rb:rb+:wb:wb+:ab:ab+:
读取文本文件 rt:rt+:wt:wt+:at:at+ 等价于 r:r+:w:w+:a:a+

f1 = open("a.png", mode='rb')
content = f1.read()
f1.close()
# print(content)

f2 = open('b.png', mode='wb')
f2.write(content)
f2.close()

三、模块与包

模块

一个.py文件就称为一个模块

导入模块中类或函数的方式：

1. 方式一：import 模块名

使用时：模块名.函数名()

import module1
module1.output()

2. 方式二 :from 模块名 import 函数名

使用时：函数名()

from module1 import output
output()

3. 方式三: from 模块名 import *

使用时：函数名()

from module1 import *
output()

*表示所有

4. 方式四:from 模块名 import 函数名 as tt(自定义)

注意原来的函数名将失效
使用时：tt()

from module1 import output as tt
tt()

5. 可以在模块当中定义一个变量all，指定导出的函数子集：
使用all的影响: 后面的[]里面写什么函数名，使用from 模块名 import 方式导入时导入什么 all如果没有这个变量将全部导入(all仅限于from 模块名 import 这种导入方式)

__all__=['output']
def output():
    print('hello neuedu')

def output2():
        print('hello output2')

包就是一个文件夹，里面包含了若干py文件以及一个_init_.py文件。

导入包中函数，以及模块的方式：

1. 方式一：from 包名 import 模块名

使用时：模块名.函数名()

from neuedu import module3
module3.output()

2. 方式二：from 包名.模块名 import 函数名

使用时：函数名()

from neuedu.module3 import output
output()

3. 方式三：import 包名.模块名

使用的时候：包名.模块名.函数名()

import neuedu.module3
neuedu.module3.output()

4. 方式四：from 包名 import *

前提是：将 init.py 文件中写入all变量(写入方式同模块导入的写入方式) 。变量当中写入哪个模块则导入哪个模块，不写则什么都不导入使用时：模块名.函数名()
_init_.py

__all__ = ['module3']

from neuedu import *
module3.output()

5. 方式五：import 包名

前提是：在包里面的init.py 文件里写入 from . import 模块名 init.py里面导入哪个模块通过本方式就能使用哪个模块
使用时：包名.模块名.函数名()
_init_.py

from . import module3

import neuedu
neuedu.module3.output()

Python正式课第十三天

一、文件操作与字符编码

1. 字符编码

2. 字符编码的简单发展历史

ASCII码诞生

GBK等各国编码诞生

Unicode诞生（万国码）

UTF-8诞生（目前最常用的）

3. 字符串和字节序列的转换

4. 文件概念

5. 文件的基本操作

read 方法 —— 读取文件

write 方法 —— 写入文件

6. read()、readline()、readlines()区别与用法

7. 使用 with open（） as 读写文件

8.文件指针

9. 打开文件的方式总结

二、文件操作练习

1.文件操作

2. 添加行号

3. 非文本文件的读取

三、模块与包

模块

导入模块中类或函数的方式：

1. 方式一：import 模块名

2. 方式二 :from 模块名 import 函数名

3. 方式三: from 模块名 import *

4. 方式四:from 模块名 import 函数名 as tt(自定义)

包

导入包中函数，以及模块的方式：

1. 方式一：from 包名 import 模块名

2. 方式二：from 包名.模块名 import 函数名

3. 方式三：import 包名.模块名

4. 方式四：from 包名 import *

5. 方式五：import 包名

猜你喜欢

热点阅读

Python正式课第十三天

一、文件操作与字符编码

1. 字符编码

2. 字符编码的简单发展历史

ASCII码诞生

GBK等各国编码诞生

Unicode诞生（万国码）

UTF-8诞生（目前最常用的）

3. 字符串和字节序列的转换

4. 文件概念

5. 文件的基本操作

read 方法 —— 读取文件

write 方法 —— 写入文件

6. read()、readline()、readlines()区别与用法

7. 使用 with open（） as 读写文件

8.文件指针

9. 打开文件的方式总结

二、文件操作练习

1.文件操作

2. 添加行号

3. 非文本文件的读取

三、模块与包

模块

导入模块中类或函数的方式：

1. 方式一：import 模块名

2. 方式二 :from 模块名 import 函数名

3. 方式三: from 模块名 import *

4. 方式四:from 模块名 import 函数名 as tt(自定义)

包

导入包中函数，以及模块的方式：

1. 方式一：from 包名 import 模块名

2. 方式二：from 包名.模块名 import 函数名

3. 方式三 ：import 包名.模块名

4. 方式四：from 包名 import *

5. 方式五：import 包名

猜你喜欢

热点阅读

3. 方式三：import 包名.模块名