beautifulsoup不会抓取电子邮件,而是抓取格式

2024-09-30 02:22:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我一直在尝试使用以下代码从这里获取电子邮件地址的不同方法:

email_pattern = 'a[href^=mailto]'
for email in soup.select(email_pattern):
  print(email)

但是,在执行此操作时,我得到了完整的代码:

<a href="mailto:emailname61@yahoo.com?subject=?"><span class="ui_icon email _3ZW3afUk"></span><span class="_2saB_OSe">Email</span><span class="ui_icon external-link-no-box _2OpUzCuO"></span></a>

我只想得到“emailname61@yahoo.com"

我一直在试着

email_pattern = 'a[href^=mailto]'
for email in soup.select(email_pattern.text):
  print(email)

email_pattern = 'a[href^=mailto]'
for email in soup.select(email_pattern):
      print(email.text)

但我只知道email或者什么都没有

我怎样才能得到电子邮件地址


Tags: 代码infor电子邮件email地址selectclass
2条回答

你能试试吗

 email_pattern = r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)"

此电子邮件位于属性href中,因此您必须获取email['href']

然后,您必须删除mailto:?subject=?


最小工作示例

from bs4 import BeautifulSoup

html = '<a href="mailto:emailname61@yahoo.com?subject=?"><span class="ui_icon email _3ZW3afUk"></span><span class="_2saB_OSe">Email</span><span class="ui_icon external-link-no-box _2OpUzCuO"></span></a>'

soup = BeautifulSoup(html, 'html.parser')

email_pattern = 'a[href^=mailto]'
for email in soup.select(email_pattern):
    data = email['href']
    data = data.split('?')[0]  # remove `?subject=?`
    data = data.replace('mailto:', '')  # remove `mailto:`
    print(data)

相关问题 更多 >

    热门问题