如何仅使用BeautifulSoup和python循环div并获取段落标记中的文本?

2024-05-02 02:58:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用beautifulsoup和python对网页进行爬网,并仅从网站的段落标记中提取文本。 This is the page I want to crawl 我想要所有段落标签中的所有文本

提前谢谢


Tags: theto文本网页is网站page标签
1条回答
网友
1楼 · 发布于 2024-05-02 02:58:30

始终使用硒作为节省资源的最后手段

from selenium import webdriver
url = 'https://www.who.int/csr/disease/coronavirus_infections/faq_dec12/en/'
driver = webdriver.Chrome()
try:
  driver.get(url)
  div_text = driver.find_element_by_id('primary').text
  with open('website_content.txt','w') as f:
    f.write(div_text)
except Exception as e:
  print(e)
finally:
  if driver is not None:
    driver.close()

您可以通过以下请求和靓汤实现同样的效果:

import requests as rq
from bs4 import BeautifulSoup


response  = rq.get(url)
if response.status_code == 200:
  soup = BeautifulSoup(response.text,'html.parser')
  div_text = soup.find('div',{'id':'primary'}).text
  with open('website_content.txt','w') as f:
    f.write(div_text)

相关问题 更多 >