分析时无法获取电子邮件

<tr><td>Email:</td><td width="10"></td><td><script>var ylhrfq = "ypr";var bdnd = "ail";var byil = "st.c";var bwdbdf = "age@";var dqiex = ".c";var pner = "om";var qkfow = "gm";var azzl = "ie";var hgcr = "n.pl";var link = byil + ylhrfq + azzl + hgcr + bwdbdf + qkfow + bdnd + dqiex + pner;var text = link;document.write('<a href="mailto:'+link+'" />'+text+'</a>');</script></td></tr>

2条回答

网友

1楼 · 编辑于 2024-09-30 10:27:35

电子邮件地址似乎隐藏在原始的html中，并由javascript代码生成。有了python2，requests，js2py，BeautifulSoup4，我终于得到了正确的电子邮件地址，希望这就是你想要的。你知道吗

import bs4
import requests
import subprocess
import js2py
from HTMLParser import HTMLParser

html = requests.get('http://findyourvacationhome.com/find.php?property=5068927').content
soup = bs4.BeautifulSoup(html, 'html.parser')
raw_script = soup.find_all('table')[6].find_all('tr')[2].find_all('td')[2].script.contents[0]

script = raw_script.replace("""var text = link;document.write('<a href="mailto:'+link+'"  />'+text+'</a>');""", """""")
result = js2py.eval_js(script)
htmlparser = HTMLParser()
result = htmlparser.unescape(result)

print(result)

我分四步完成：

使用requests获取网页的html
使用BeautifulSoup4解析html代码并获取用于生成电子邮件的javascript代码
使用js2py执行js代码并获得结果。你知道吗
用HTMLParser对字符串进行转义

网友

2楼 · 编辑于 2024-09-30 10:27:35

你需要得到解析过的html。源本身只包含占位符和脚本。在PowerShell中，我会运行以下命令以获取电子邮件：

$t = Invoke-WebRequest -Uri "http://findyourvacationhome.com/find.php?property=5068927"
$t.Links | Where-Object { $_.href -match 'mailto' } | Select-Object -ExpandProperty outertext

相关问题更多 >

编程相关推荐

热门问题

热门文章