python 中文编码终极方案

2017-12-02 本文已影响0人厚土为山

[TOC]

关键词列表

python 乱码
python utf8
python utf-8
python unicode
python 中文
python 转码
python dict 编码

str 和 unicode

python 里字符串类型有两种:

str : 以utf8形式存储的字符串
unicode : 直接以unicode 形式存储的字符串
这两种相当于是对应两个不同的类, 只是有一些方法可以实现相互转换

str 类型

a='中'
len(a) = 3  , 表示 '中' 的 utf-8表示形式在内存里需要3字节

unicode 类型

a=u'中'
len(a) = 1 , 表示 一共有 1 个符号

str-->unicode

a='中'
a.decode('utf-8')
,
str 类型只能调用 decode, 不能调用 encode

unicode --> str

a=u'中'
a.encode('utf-8')
,
unicode 类型只能调用 encode, 不能调用 decode

dict 打印中文

怎样才能让 dict 类型序列化后的结果显示中文字符, 而不是显示 '\u4e2d' 或者 '\xae28\xa3b2'

>>> a={}
>>> a[1] = u'中'
>>> a
{1: u'\u4e2d'}
>>> print a
{1: u'\u4e2d'}

方法一:

import json
a={}
a[1] = u'中'

#b=json.dumps(a,  ensure_ascii=False)
b=json.dumps(a,  ensure_ascii=False)

print b

结果 ===> {"1": "中"}

<br />
方法二:

a={}
a[1] = u'中'
b=str(a)  或者  b = `a`

################
b=b.replace('u\'', '\'')
print b.decode('unicode-escape')
## 不能使用 print b.decode('utf-8')


结果 ==> {1: '中'}

list 打印中文

a=[]
a.append('中国')
b=str(a)
print b
####
结果 ==> ['\xe4\xb8\xad\xe5\x9b\xbd']

b=b.decode('string_escape')
print b
结果 ==> ['中国']

unicode --> unicode-escape


## unicode --> unicode-escape
>>> u'中文测试'.encode('unicode-escape')
'\\u4e2d\\u6587\\u6d4b\\u8bd5'

## unicode-escape --> unicode
>>> '\\u4e2d\\u6587\\u6d4b\\u8bd5'.decode('unicode-escape')
u'\u4e2d\u6587\u6d4b\u8bd5'

python 中文编码终极方案

关键词列表

str 和 unicode

str 类型

unicode 类型

str-->unicode

unicode --> str

dict 打印中文

list 打印中文

unicode --> unicode-escape

猜你喜欢

热点阅读