python3.6 图片识别转文本

2018-12-08  本文已影响41人  夜空最亮的9星

python3.6 图片识别转文本

环境

conda + python3.6 + jupyter notebook
安装依赖
pip install pillow

pip install tesseract

pip install pytesseract

Installation 图片识别引擎

For CentOS 7 run the following as root:

yum-config-manager --add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_7/
sudo rpm --import https://build.opensuse.org/projects/home:Alexander_Pozdnyakov/public_key
yum update
yum install tesseract 
yum install tesseract-langpack-deu

安装完成后配置环境变量:

vim /etc/profile

export TESSDATA_PREFIX="/usr/share/tesseract/4/tessdata"
export PATH=$PATH:$TESSDATA_PREFIX

检查当前语言包:tesseract --list-langs

(python36) [root@centos-7 ~]# tesseract --list-langs
List of available languages (3):
chi_sim
eng
osd
(python36) [root@centos-7 ~]# 

下载语言库将语言包拷贝到/usr/share/tesseract/4/tessdata目录下

image

运行代码:

# -*- coding: utf-8 -*-
from PIL import Image

import pytesseract

#上面都是导包,只需要下面这一行就能实现图片文字识别
text=pytesseract.image_to_string(Image.open('img1.jpg'),lang='chi_sim') #设置为中文文字的识别

#text=pytesseract.image_to_string(Image.open('test.png'),lang='eng')   #设置为英文或阿拉伯字母的识别

print(text)

遍历一个目录:

from PIL import Image
import pytesseract
import os

path="/home/imgs"
file_list=os.listdir(path)
fo=open("data.txt","w")

for file in file_list:
    text=pytesseract.image_to_string(Image.open(os.path.join(path,file)),lang='chi_sim')
    print(text)
    fo.write(text)
fo.close
上一篇 下一篇

猜你喜欢

热点阅读