鸟语数析

python 统计文件中单词出现的频率2

2019-07-30  本文已影响0人  SkTj

import sys
import re

WORD_RE = re.compile('\w+')

index = {}
with open(sys.argv[1], encoding='utf-8') as fp:
for line_no, line in enumerate(fp, 1):
for match in WORD_RE.finditer(line):
word = match.group()
column_no = match.start()+1
location = (line_no, column_no)
index.setdefault(word, []).append(location) # <1>

print in alphabetical order

for word in sorted(index, key=str.upper):
print(word, index[word])

END INDEX

上一篇 下一篇

猜你喜欢

热点阅读