python3 中的编码问题:encode 和 decode

2017-06-24  本文已影响0人  吴邪_TicktW

python中的帮助

encode(...) method of builtins.str instance
S.encode(encoding='utf-8', errors='strict') -> bytes
Encode S using the codec registered for encoding. Default encoding
is 'utf-8'. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
'xmlcharrefreplace' as well as any other name registered with
codecs.register_error that can handle UnicodeEncodeErrors.

decode(encoding='utf-8', errors='strict') method of builtins.bytes instance
Decode the bytes using the codec registered for encoding.

encoding
The encoding with which to decode the bytes.
errors
The error handling scheme to use for the handling of decoding errors.
The default is 'strict' meaning that decoding errors raise a
UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
as well as any other name registered with codecs.register_error that
can handle UnicodeDecodeErrors.

代码

# 将'汉字'字符串实例编码为utf-8,即将汉字转化为计算机能够识别的二进制数字b'\xe6\xb1\x89\xe5\xad\x97'
'汉字'.encode('utf-8') 
# 将计算机中的二进制数字转化为对应的字符对象
b'\xe6\xb1\x89\xe5\xad\x97'.decode('utf-8')  

windows 下的编码问题

# 将其它编码格式的文本转化为windows可显示文本 
# (utf-8)文本----->gbk----> 二进制数字---gbk---> 文本(gbk)
'其它编码文本'.encode('gbk').decode('gbk')
'其它编码文本'.encode('gbk', 'ignore').decode('gbk')
# lago1.html为utf-8编码的文件,但在win下会自动以gbk方式读取,报错
with open('lago1.html', 'r') as f:
        content = f.read()
print(content)
# 此时可先读取二进制,在解码为utf-8
with open('lago1.html', 'rb') as f:
        content = f.read()
print(content.decode('utf-8'))
上一篇 下一篇

猜你喜欢

热点阅读