人生苦短,我用Python

Show me the code_0006题

2016-05-11  本文已影响144人  bluescorpio

0006题:你有一个目录,放了你一个月的日记,都是txt,为了避免分词的问题,假设内容都是英文,请统计出你认为每篇日记最重要的词。

解题思路:可以用刚写的另一篇文章collections库里面的一些方法,比如Counter()most_common()
代码如下:

#! /usr/bin/env python
#coding=utf-8
import os
import re
from collections import Counter

def get_filepaths(directory):
    file_paths = []
    for root, directories, files in os.walk(directory):
        for filename in files:
            filepath = os.path.join(root, filename)
            file_paths.append(filepath)
            
    return file_paths

def counter_more_words(li):
    word_dict = Counter(li)
    return [i[0] for i in word_dict.most_common()[:10]]

if __name__ == '__main__':
    for file in get_filepaths(r'C:\diaries'):
        with open(file, 'r') as f:
            word_li = re.findall("\w+", f.read())
            print " ".join(counter_more_words(word_li))
上一篇下一篇

猜你喜欢

热点阅读