如何仅使用BeautifulSoup和python循环div并获取段落标记中的文本？ - 问答 - Python中文网

如何仅使用BeautifulSoup和python循环div并获取段落标记中的文本？

2024-05-02 02:58:30 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我正在使用beautifulsoup和python对网页进行爬网，并仅从网站的段落标记中提取文本。 This is the page I want to crawl 我想要所有段落标签中的所有文本

提前谢谢

Tags： the to 文本网页 is 网站 page 标签

1条回答

网友

1楼 · 发布于 2024-05-02 02:58:30

始终使用硒作为节省资源的最后手段

from selenium import webdriver
url = 'https://www.who.int/csr/disease/coronavirus_infections/faq_dec12/en/'
driver = webdriver.Chrome()
try:
  driver.get(url)
  div_text = driver.find_element_by_id('primary').text
  with open('website_content.txt','w') as f:
    f.write(div_text)
except Exception as e:
  print(e)
finally:
  if driver is not None:
    driver.close()

您可以通过以下请求和靓汤实现同样的效果：

import requests as rq
from bs4 import BeautifulSoup


response  = rq.get(url)
if response.status_code == 200:
  soup = BeautifulSoup(response.text,'html.parser')
  div_text = soup.find('div',{'id':'primary'}).text
  with open('website_content.txt','w') as f:
    f.write(div_text)

相关问题更多 >

编程相关推荐

热门问题

热门文章