在python中提取href

2024-09-29 23:25:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试图用这段代码提取href

while the soup is like this 
</div>
 </div>
 </article>
 </div>
 <div class="listing">
 <article class="listing-item image-left" itemscope="" itemtype="https://schema.org/NewsArticle">
 <div class="listing-image image-container">
 <a class="image page-link" href="/mundo/venezuela/entrevista-con-el-representante-para-los-migrantes-venezolanos-eduardo-stein-425664">
 <img alt="" src="/files/image_184_123/uploads/2019/10/22/5daf22f15ed09.jpeg"/>
 </a>
 </div>

import requests

url = "https://www.eltiempo.com/buscar?q=migrantes+venezolanos"

# Getting the webpage, creating a Response object.
response = requests.get(url)

# Extracting the source code of the page.
data = response.text

# Passing the source code to BeautifulSoup to create a BeautifulSoup object for it.
soup = BeautifulSoup(data, 'lxml')

# Extracting all the <a> tags into a list.
tags = soup.find_all('div')

# Extracting URLs from the attribute href in the <a> tags.
for tag in tags:
    print(tag.get('href'))

有人能帮我吗?我在互联网上找到的所有例子都是用接近a的HREF,更容易提取

谢谢


Tags: thehttpsimagedivarticletagspageclass
1条回答
网友
1楼 · 发布于 2024-09-29 23:25:21

可能你想要data = response.html,以及soup.find_all('a')。如果您只需要带有href的<a>标记,也可以使用soup.find_all('a', href=True)(请参见BeautifulSoup getting href

import requests

url = "https://www.eltiempo.com/buscar?q=migrantes+venezolanos"

response = requests.get(url)

data = response.html
soup = BeautifulSoup(data, 'lxml')
tags = soup.find_all('a')
for tag in tags:
    print(tag['href'])

相关问题 更多 >

    热门问题