tesseract OCR识别图片中的文字

2019-05-28 本文已影响0人 georgesre

目前有很多OCR识别提取图片中文字的软件，也包括很多笔记软件，比如onenote等。除了现有的软件外，我们也可以利用开源的tesseract 来实现OCR。

环境：

mac系统（其他系统可以使用，我们这里用mac系统作为示例）

开始

我自用的脚本和常用的提取图片文字的方式：

用HomeBrew安装tesseract

brew install tesseract

为了使用convert命令进图片格式转换，我们安装ImageMagick

brew install imagemagick

准备脚本tesser.sh

#!/bin/bash

#Usage:   tesser filename.png
#Example: tesser xxxx.png

#To check usage
if [ $# != 1 ] ;  then
        echo "Usage:sh tesser filename.png"
        exit
fi

png_filename="$1"
tif_filename="tif_temp.tif"

convert -density 200 -units PixelsPerInch -type Grayscale +compress "$png_filename" $tif_filename
tesseract $tif_filename out -l eng
#tesseract $tif_filename out -l chi_sim
cat out.txt

####
rm -f $tif_filename
rm -f out.txt
exit

给执行的权限

chmod u+x tesser.sh

识别文章中的文字

./tesser.sh
Usage:sh tesser filename.png

演示：

演示

云平台开发运维解决方案@george.sre

个人主页：https://geekgoogle.com

GitHub: https://github.com/george-sre

Mail: george.sre@hotmail.com

简书: georgesre - 简书

欢迎交流～

tesseract OCR识别图片中的文字

环境：

开始

演示：

猜你喜欢

热点阅读