提取日文字体的CMAP并或者所有字形的编码

2020-04-12  本文已影响0人  千羽之城88

最近和知乎的 @steve-cheug 交流。这里做个备份。

获取字体的CMAP表

ttx -t camp KozGoPr6N-Regular.otf
--------------------------------------
Dumping "KozGoPr6N-Regular.otf" to "KozGoPr6N-Regular.ttx"...
Dumping 'cmap' table...

然后借助一下python脚本,得到UID:

#!/usr/bin/python3
# -*- encoding: utf-8 -*-

import os
import sys
import datetime

i=0
#print(datetime.datetime.now())
f = open("out.x", "w")
for line in open(sys.argv[1], encoding="utf-8"):
    if( "CJK" in line ):
        columns = line.split('"')
        if len(columns) >= 4:
            i += 1
            if( "uni" in columns[3] ):
                # <map code="0xff5d" name="uniFF5D"/>
                uid = columns[3].replace('uni','')
                if(len(uid) >= 2):
                    uid = int(uid, 16)
                    if( uid > 10000 ):
                        print(chr(uid) + "  " + str(i))
                        f.write(chr(uid) + '\n')
            elif( 'cid' in columns[3] ):
                # <map code="0xff5b" name="cid28609"/>
                uid = columns[3].replace('cid','')
                if(len(uid) >=2):
                    uid = int(uid)
                    if(uid > 10000):
                        print(chr(uid) + "  " + str(i))
                        f.write(chr(uid) + '\n')
            
f.close()
#print(datetime.datetime.now())

得到一个 out.x 的文件。

python 脚本:

#! /usr/bin/env python3
# -*- encoding: utf-8 -*-

# Script for get ttf font camp table
# 2019/12/26
# use: python3 cmap01.py 'NotoSansSC-Kクレ.ttf'

from fontTools.ttLib import TTFont
from fontTools.ttLib.tables._c_m_a_p import CmapSubtable
import sys,os

fontfile = sys.argv[1]
font = TTFont(fontfile)
outfile = open("temp.x", "w") # w=write, a=append

cmap = font['cmap']

for cmap in cmap.tables:
    if( cmap.platformID == 3 and cmap.platEncID in [0, 1, 10]): # window and BMP
        for cid in cmap.cmap.items():
            # write codepoint and chars
            if( cid[0] > 10000 ):
                outfile.write("%s\t%x\t%s\n" % (cid[0], cid[0], chr(cid[0])))
outfile.close()

得到的结果如下:

12002   2ee2    ⻢
12004   2ee4    ⻤
12005   2ee5    ⻥
12006   2ee6    ⻦
12008   2ee8    ⻨
12009   2ee9    ⻩
12010   2eea    ⻪
12011   2eeb    ⻫
12012   2eec    ⻬
12013   2eed    ⻭
12014   2eee    ⻮
12015   2eef    ⻯
12016   2ef0    ⻰
上一篇下一篇

猜你喜欢

热点阅读