Python 提取邮件头基本信息

2017-07-17  本文已影响0人  Tim_Lee

1 邮件内容

假设目前邮件名叫“1.txt”,邮件内容为:

From:   Justin-Bieber@entertain.org on behalf of Bieber
Leader [leader@hello.org]
Sent:   2017-07-01 12:48
To: 'staff@hello.org'; custom@hello.org;
Willim Johnson; John Snow
Subject:    The battlefield in Winterfell


I have just met then. More details as soon as possible. So far, so good.

Sent via iPhone 7 plus

2 提取思路

# coding: utf-8
import re

from_count = 0
sent_count = 0
to_count = 0
subject_count = 0


def inspect_string(string):
    global from_count
    global sent_count
    global to_count
    global subject_count

    keyword_list = ['From:', 'Sent:', 'To:', 'Subject:']
    for keyword in keyword_list:
        regex_str = ".*({0}.*)".format(keyword)
        match_obj = re.match(regex_str, string)

        if re.match(".*(From:.*)", string):
            from_count += 1

        if re.match(".*(Sent:.*)", string):
            sent_count += 1

        if re.match(".*(To:.*)", string):
            to_count += 1

        if re.match(".*(Subject:.*)", string):
            subject_count += 1

        if match_obj:
            return match_obj.group(1)

        if from_count > 0 and sent_count < 1:
            return string

        if sent_count > 0 and to_count < 1:
            return string

        if to_count > 0 and subject_count < 1:
            return string


with open('1.txt', 'rb') as f:
    for line in f:
        result = inspect_string(str(line))
        if result is None:
            continue
        print(result)

3 运行结果

From:   Justin-Bieber@entertain.org on behalf of Bieber
Leader [leader@hello.org]

Sent:   2017-07-01 12:48

To: 'staff@hello.org'; custom@hello.org;

Willim Johnson; John Snow

Subject:    The battlefield in Winterfell
上一篇下一篇

猜你喜欢

热点阅读