Python 进行 OCR识别 -- pytesseract库

2021-03-17 本文已影响0人 bianruifeng

pip3 install pytesseract
brew install tesseract

资源文件：

image.png

创建py文件：

import pytesseract
from PIL import Image

im_en = Image.open('English.png')
im_ch = Image.open('Chinese.png')

print('========识别字母========')
print(pytesseract.image_to_string(im_en), '\n\n')

print('========识别中文========')
print(pytesseract.image_to_string(im_ch, lang='chi_sim'))

运行报错：
找不到/usr/local/Cellar/tesseract/4.1.1/share/tessdata/chi_sim.traineddata文件
下载：

识别中文需要新的字库

链接: https://pan.baidu.com/s/1XN_Dc0EVu4dv_VYQY7IWMQ 提取码: ghxj
将下载的中文库放在 Tesseract-OCR 安装目录下的 tessdata 文件夹中，即可。

运行结果：

image.png

百度的OCR https://cloud.baidu.com/doc/OCR/s/zk3h7xw5e

Python 进行 OCR识别 -- pytesseract库

识别中文需要新的字库

猜你喜欢

热点阅读