如何从BlackBerry 10电子邮件中获取纯文本？

1条回答

网友

1楼 · 发布于 2024-09-21 01:16:07

使用python和xpath从HTML中提取文本：

#!/usr/bin/python3
import urllib.request
import quopri
import lxml.html

# actual test fragments are here
raw_url = 'https://gist.github.com/Supermathie/7866658/raw/80e4abd4226b916a54b224677af7fda881d0937f/sample+1'
raw_url_no_sig = 'https://gist.github.com/Supermathie/7866658/raw/df354d6b8f3176c3d8bdb89b2961bb0ccc78520c/sample+2'

def get_divs(url):
    email_body_raw = urllib.request.urlopen(url).read()
    email_body = quopri.decodestring(email_body_raw)
    email_xml = lxml.html.document_fromstring(email_body)
    email_divs = email_xml.xpath('//div[@id="_signaturePlaceholder"]/preceding-sibling::div')
    return email_divs

print('\n'.join([str(node.text_content() or "") for node in get_divs(raw_url)]))
print('\n'.join([str(node.text_content() or "") for node in get_divs(raw_url_no_sig)]))

对于两个测试用例，打印：

Let's remember that the information in the article was filtered through no less than two people who don't fully speak tech. I think I can translate it back:
«The FBI crafted a custom piece of malware targeting Mo, designed to snoop his activities . A link was emailed to Mo in a spear phishing attack in an attempt to get hin to download and install the malware from the FBI's monitored servers.
The attempt failed; the software was downloaded but never executed in a manner enabling the software to send back information to the FBI.»
Nothing too special. I wonder if Mo had the balls to submit the software to Sophos etc. for malware analysis. :)
M.

以及

Test email
No signature

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何从BlackBerry 10电子邮件中获取纯文本？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >