预处理

2017-08-19 本文已影响0人 Jakai

stopwords = nltk.corpus.stopwords.words("english")
eng_stopwords = set(stopwords)
def clean_text(text):
    text = BeautifulSoup(text, 'html.parser').get_text()
    text = re.sub(r'[^a-zA-Z]', ' ', text)
    words = text.lower().split()
    words = [w for w in words if w not in eng_stopwords]
    return ' '.join(words)

热点阅读

早餐里见世界
谏言：全国的扫黑反腐

08-22浅谈对“天津爆炸事故”的看法和感想
07-04元芳你怎么看下一句
07-03陪伴是最长情的告白下一句
01-21你知道fighting是什么意思？告诉你fighting的意思
06-23深度好文：生命的意义不单是幸福
06-20深度好文：人最怕深交后的陌生

预处理

猜你喜欢

热点阅读