如何在python3下使用TextGrocery

2019-10-11  本文已影响0人  郭彦超

TextGrocery是一款高效的短文本分类工具,后期我们会通过该工具训练文本规则实现给作品内容自动打标签; 该项目作者目前已不再维护此项目,最新版本只支持python2 ,为了在python3也能使用,需做如下修改

首先第一步通过 pip 安装TextGrocery

pip install tgrocery
# 该项目作者已不再维护,最新版是0.14

找不到module

#1、修改 /home/bigdata/anaconda3/lib/python3.7/site-packages/tgrocery/__init__.py 为
from .classifier import *
from .converter import *

#2、修改./site-packages/tgrocery/classifier.py 加 “.”
from .converter import GroceryTextConverter
from .learner import *
from .base import *

pip install pickle5

vi ./site-packages/tgrocery/converter.py
将 import cPickle  改为 import pickle5 as cPickle

#修改 site-packages/tgrocery/converter.py 在base前加“.”
import .base

print函数在python3中有调整(需加括号)

# 修改site-packages/tgrocery/.base.py
 print( self.draw_table(
            zip(
                ['%.2f%%' % (s * 100) for s in self.accuracy_labels.values()],
                ['%.2f%%' % (s * 100) for s in self.recall_labels.values()]
            ),
            self.accuracy_labels.keys(),
            ('accuracy', 'recall')
  ) ) 

NameError: name ‘unicode’ is not defined

python3中将unicode换成了str,将 site-packages/tgrocery/classifier.py中所有出现的unicode进行替换

TypeError: The argument should be plain text

注释掉下面的语句

# vi site-packages/tgrocery/classifier.py
 if not isinstance(text,str):
               raise TypeError('The argument should be plain text')

修改jieba.cache目录为当前安装目录

# vi site-packages/jieba/__init__.py
self.tmp_dir = "/home/bigdata/anaconda3/lib/python3.7/site-packages/jieba/"

'dict' object has no attribute 'iteritems'

在 site-packages/tgrocery/converter.py 将所有的 iteritems 替换为 items

大功告成、官方实例运行如下

>>> from tgrocery import Grocery
>>> grocery = Grocery('sample')
>>> train_src = [
...     ('education', '名师指导托福语法技巧:名词的复数形式'),
...     ('education', '中国高考成绩海外认可 是“狼来了”吗?'),
...     ('sports', '图文:法网孟菲尔斯苦战进16强 孟菲尔斯怒吼'),
...     ('sports', '四川丹棱举行全国长距登山挑战赛 近万人参与')
... ]
>>> grocery.train(train_src)
Building prefix dict from the default dictionary ...
Dumping model to file cache /home/bigdata/anaconda3/lib/python3.7/site-packages/jieba/jieba.cache
Loading model cost 0.595 seconds.
Prefix dict has been built succesfully.
*
optimization finished, #iter = 3
Objective value = -1.092381
nSV = 8
<tgrocery.Grocery object at 0x7ffedbea5290>
>>> grocery.predict('考生必读:新托福写作考试评分标准')
<tgrocery.base.GroceryPredictResult object at 0x7ffed68e9610>
>>> grocery.predict('考生必读:新托福写作考试评分标准').accuracy_labels
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'GroceryPredictResult' object has no attribute 'accuracy_labels'
>>> grocery.predict('考生必读:新托福写作考试评分标准').dec_values
{'education': 0.03393735426359166, 'sports': -0.033937354263591644}

内容自动标注demo

规则打标签系统
上一篇下一篇

猜你喜欢

热点阅读