在python中提取href

while the soup is like this </div> </div> </article> </div> <div class="listing"> <article class="listing-item image-left" itemscope="" itemtype="https://schema.org/NewsArticle"> <div class="listing-image image-container"> <a class="image page-link" href="/mundo/venezuela/entrevista-con-el-representante-para-los-migrantes-venezolanos-eduardo-stein-425664"> <img alt="" src="/files/image_184_123/uploads/2019/10/22/5daf22f15ed09.jpeg"/> </a> </div> import requests url = "https://www.eltiempo.com/buscar?q=migrantes+venezolanos" # Getting the webpage, creating a Response object. response = requests.get(url) # Extracting the source code of the page. data = response.text # Passing the source code to BeautifulSoup to create a BeautifulSoup object for it. soup = BeautifulSoup(data, 'lxml') # Extracting all the <a> tags into a list. tags = soup.find_all('div') # Extracting URLs from the attribute href in the <a> tags. for tag in tags: print(tag.get('href'))

1条回答

网友

1楼 · 发布于 2024-09-29 23:25:21

可能你想要data = response.html，以及soup.find_all('a')。如果您只需要带有href的<a>标记，也可以使用soup.find_all('a', href=True)（请参见BeautifulSoup getting href）

import requests

url = "https://www.eltiempo.com/buscar?q=migrantes+venezolanos"

response = requests.get(url)

data = response.html
soup = BeautifulSoup(data, 'lxml')
tags = soup.find_all('a')
for tag in tags:
    print(tag['href'])

相关问题更多 >

编程相关推荐

热门问题

热门文章