pdfminer 转换 PDF文件为列表

2021-11-10  本文已影响0人  偷油考拉

pdfminer · PyPI

1. CentOS8 安装 pdfminer

[root@VM-99-12-centos ~]# pip install pdfminer
WARNING: Running pip install with root privileges is generally not a good idea. Try `pip install --user` instead.
Collecting pdfminer
  Downloading http://mirrors.tencentyun.com/pypi/packages/71/a3/155c5cde5f9c0b1069043b2946a93f54a41fd72cc19c6c100f6f2f5bdc15/pdfminer-20191125.tar.gz (4.2MB)
    100% |████████████████████████████████| 4.2MB 135.0MB/s 
Collecting pycryptodome (from pdfminer)
  Downloading http://mirrors.tencentyun.com/pypi/packages/af/ef/bedde9b7a1f237b743eb307e6c247369c2ae5ca6a79b61c064698cfd78cd/pycryptodome-3.10.1-cp35-abi3-manylinux1_x86_64.whl (1.9MB)
    100% |████████████████████████████████| 1.9MB 3.3MB/s 
Installing collected packages: pycryptodome, pdfminer
  Running setup.py install for pdfminer ... done
Successfully installed pdfminer-20191125 pycryptodome-3.10.1

2. 测试pdf2txt.py

将PDF第三页转换为text格式,保存为文本pdf-page3.txt

pdf2txt.py -t text -p 3  -o pdf-page3.txt 62100400348.pdf
上一篇 下一篇

猜你喜欢

热点阅读