从移动设备发送的电子邮件使用电子邮件库进行奇怪的解码

2024-10-01 00:36:00 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用pythonimaplib和email模块获取来自smtp的电子邮件列表,然后对它们进行处理。这是我用来抓取和解码电子邮件的片段:

import imaplib
import email

# Connect to server
box = imaplib.IMAP4(CSMTP_SERVER)
box.login(CSMTP_USERNAME, CSMTP_PASSWORD)

# List inbox
box.select('INBOX')

# Retrieve email list ID's matching search patterns
# Return from search is this:
# ('OK', ['1 2 3 4 5 6 7 8 9 10 11 12 13 14'])
data = box.search(None, 'ALL')[1]
for num in data[0].split():

# Retrieve message headers and body
headers = email.message_from_string(box.fetch(num, '(RFC822)')[1][0][1])
body = headers.get_payload()
if not isinstance(body, str):
    body = headers.get_payload()[0].get_payload()

print headers, body

当从Hotmail或Gmail发送电子邮件时,这就像一个符咒,但无论何时发送电子邮件,例如,从Android默认邮件应用程序发送邮件时,邮件将如下所示:

=?utf-8?B?RndkOiBDYXBzaGFyZTogaW1wb3J0aW5nIGZyb20gUGhvdG9z?
U2VudCBmcm9tIG15IEhUQwoKLS0tLS0gRm9yd2FyZGVkIG1lc3NhZ2UgLS0tLS0KRnJvbTogIkFs
ZXhhbmRlciBBdnRhbnNraSIgPGFsZXhAYXZ0YW5za2kuY29tPgpUbzogIlBlam1hbiBNYWtoZmki
IDxwakBtYWtoZmkuY29tPgpTdWJqZWN0OiBDYXBzaGFyZTogaW1wb3J0aW5nIGZyb20gUGhvdG9z
CkRhdGU6IFdlZCwgU2VwIDEwLCAyMDE0IDk6MDYgUE0KCkhpIFBlam1hbiwKCkkgd2FzIHBsYXlp
bmcgd2l0aCBDYXBzaGFyZSB0b2RheSBhbmQgZm91bmQgc29tZXRoaW5nIG1pc3NpbmcuIEkgZ3Vl
c3MgeW91CmhhdmUgcGxhbnMgZm9yIGl0LCBidXQgaXQgZG9lc24ndCBodXJ0IHRvIG1lbnRpb24g
aXQsIGp1c3Qgb24gY2FzZS4uLgoKV2hlbiBpbXBvcnRpbmcgcGhvdG9zLCBJIGhhdmUgdGhlIG9w
dGlvbiB0byBlaXRoZXIgZ2V0IG9uZSBvZiB0aGUgaW1hZ2VzCnRoYXQgYXJlIGRvd25sb2FkZWQg
b24gbXkgcGhvbmUsIG9yIHRvIHRha2UgYSBuZXcgcGljdHVyZS92aWRlby4gV2hhdCdzCm1pc3Np
bmcgaXMgYWJpbGl0eSB0byBnZXQgcGhvdG9zIGZyb20gbXkwcyBJJ3ZlIHVzZWQgZG9uJ3Qg
Y2FyZSB3aGVyZSB0aGUgcGhvdG8gaXMgbG9jYXRlZCBhbmQgYWxsCnBpY3R1cmVzIGFyZSBlcXVh
bGx5IGFjY2Vzc2libGUgKG9yIG1heWJlIHRoaXMgYXBwbGllcyBvbmx5IHRvIEdvb2dsZQphcHBz
PykuCgpOb3QgaW1wb3J0YW50LCBubyBpZGVhIGlmIGl0IGlzIGp1c3QgYSBsaW5lIG9yIHR3byBm
aXggb3Igc29tZXRoaW5nIG1vcmUKY29tcGxpY2F0ZWQuCgpUYWtlIGNhcmUsCgotIEFsZXgsIGJl
dGEgdGVzdGVyLCBRQSB2b2x1bnRlZXIsIGFuZCBzZW5pb3IgcGVza3kgc3RpY2tsZXI=

当我收到这条信息时,我正在用我的移动设备发送电子邮件。我怀疑这有什么关系,更像是一些电子邮件没有建立正确的邮件头的基础上RFC822,但我需要解决这个问题,以某种方式,能够检索每一封电子邮件。你知道吗

如果你能给我一些提示,我会很感激的。提前谢谢。你知道吗


Tags: fromimportboxsearchdataget电子邮件email
2条回答

这是一个MIME消息—它不是在RFC822上指定的,而是在较新的2045-2047上指定的。你知道吗

绝大多数现代电子邮件都在某种程度上使用MIME,所以你肯定应该支持它。你知道吗

与此消息特别相关的是rfc247,它指定了Encoded-Word。有一个good overview on wikipedia,我将部分转录:

The form is: "=?charset?encoding?encoded text?=".

encoding can be either "Q" denoting Q-encoding that is similar to the quoted-printable encoding, or "B" denoting base64 encoding.

因此,对于这个特定的消息,您有一个Base64编码的(Butf-8编码文本。实际的消息在B?之后开始,而不是在第二行上。你知道吗

下面是一些简单的python代码来处理所有这些:

if body.startswith("=?"):
    i1= body.index("?")
    i2= body.index("?", i1+1)
    i3= i2+2
    encoding= body[i1+1:i2]
    assert body[i2:i3]=="?B" #don't handle Q format, it's not commonly used
    body= base64.b64decode(body[i3+1:]).decode(encoding)

奇怪的编码是base64

>>> import base64
>>> base64.decodestring('RndkOiBDYXBzaGFyZTogaW1wb3J0aW5nIGZyb20gUGhvdG9z?').decode('utf8')
u'Fwd: Capshare: importing from Photos'
>>> base64.decodestring('''U2VudCBmcm9tIG15IEhUQwoKLS0tLS0gRm9yd2FyZGVkIG1lc3NhZ2UgLS0tLS0KRnJvbTogIkFs
... ZXhhbmRlciBBdnRhbnNraSIgPGFsZXhAYXZ0YW5za2kuY29tPgpUbzogIlBlam1hbiBNYWtoZmki
... IDxwakBtYWtoZmkuY29tPgpTdWJqZWN0OiBDYXBzaGFyZTogaW1wb3J0aW5nIGZyb20gUGhvdG9z
... CkRhdGU6IFdlZCwgU2VwIDEwLCAyMDE0IDk6MDYgUE0KCkhpIFBlam1hbiwKCkkgd2FzIHBsYXlp
... bmcgd2l0aCBDYXBzaGFyZSB0b2RheSBhbmQgZm91bmQgc29tZXRoaW5nIG1pc3NpbmcuIEkgZ3Vl
... c3MgeW91CmhhdmUgcGxhbnMgZm9yIGl0LCBidXQgaXQgZG9lc24ndCBodXJ0IHRvIG1lbnRpb24g
... aXQsIGp1c3Qgb24gY2FzZS4uLgoKV2hlbiBpbXBvcnRpbmcgcGhvdG9zLCBJIGhhdmUgdGhlIG9w
... dGlvbiB0byBlaXRoZXIgZ2V0IG9uZSBvZiB0aGUgaW1hZ2VzCnRoYXQgYXJlIGRvd25sb2FkZWQg
... b24gbXkgcGhvbmUsIG9yIHRvIHRha2UgYSBuZXcgcGljdHVyZS92aWRlby4gV2hhdCdzCm1pc3Np
... bmcgaXMgYWJpbGl0eSB0byBnZXQgcGhvdG9zIGZyb20gbXkgYWxidW1zIG9yIGJhY2tlZC11cCBw
... aG90b3MgdGhhdAphcmUgbm90IHBoeXNpY2FsbHkgc3RvcmVkIG9uIHRoZSBkZXZpY2UgLSBmb3Ig
... ZXhhbXBsZSB0aG9zZSBvbiBHb29nbGUKZHJpdmUuIE1vcbmQgYWxsCnBpY3R1cmVzIGFyZSBlcXVh
... bGx5IGFjY2Vzc2libGUgKG9yIG1heWJlIHRoaXMgYXBwbGllcyBvbmx5IHRvIEdvb2dsZQphcHBz
... PykuCgpOb3QgaW1wb3J0YW50LCBubyBpZGVhIGlmIGl0IGlzIGp1c3QgYSBsaW5lIG9yIHR3byBm
... aXggb3Igc29tZXRoaW5nIG1vcmUKY29tcGxpY2F0ZWQuCgpUYWtlIGNhcmUsCgotIEFsZXgsIGJl
... dGEgdGVzdGVyLCBRQSB2b2x1bnRlZXIsIGFuZCBzZW5pb3IgcGVza3kgc3RpY2tsZXI=''').decode('utf8')
u'Sent from my HTC\n\n  - Forwarded message   -\n....

相关问题 更多 >