Python-140 Extract txt from Imag

2022-03-23  本文已影响0人  RashidinAbdu

  1. 英文的提取:图片中的英文字,数字都基本可以准确提取,步骤为:
pip install pytesseract

图片为:


image.png
from PIL import Image
import PIL.Image

from pytesseract import image_to_string
# Import modules
from PIL import Image
import pytesseract

# Include tesseract executable in your path
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Create an image object of PIL library
image = Image.open('C:\\Users\\Administrator\\Desktop\\捕获.PNG')

# pass image into pytesseract module
# pytesseract is trained in many languages
image_to_text = pytesseract.image_to_string(image, lang='eng')

# Print the text 
print(image_to_text)

# write the result in to a txt file 
f = open (r'C:\\Users\\Administrator\\Desktop\\all.txt','w')

print (image_to_text,file = f)

f.close()

运行结果为:


image.png

即:成功获取图片中的文字部分;

Traineddata Files for Version 4.00 + | tessdoc (tesseract-ocr.github.io)

image.png

import os
mergefiledir="D:\\GRAD_COURSES\\Ph.D_Publications\\2021_Publications\\KCTC-Deposition\\1"

filenames=os.listdir(mergefiledir)
file=open('D:\\GRAD_COURSES\\Ph.D_Publications\\2021_Publications\\KCTC-Deposition-Names.txt','w')

for filename in filenames:
     filepath=mergefiledir+'\\'+filename
     for line in open(filepath): file.writelines(line)
     file.write('\n')
file.close()
上一篇 下一篇

猜你喜欢

热点阅读