靓汤提取信息

2024-06-23 18:48:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我试着提取化学物质的名称,它的发生/用途和日期添加使用美丽的汤。 这是清单中化学物质的一个例子 https://oehha.ca.gov/chemicals/abiraterone-acetate

有人能帮我吗?非常感谢你!你知道吗

我的愿望是

Abiraterone acetat from L253
<h1 class="title" id="page-title"><span class="ca-gov-icon-arrow-down"></span> Abiraterone acetate </h1>

A CYP17 inhibitor indicated in combination with prednisone for the treatment of patients with metastatic castration-resistant prostate cancer
from L265
<h3 class="label-above">Occurence(s)/Use(s)</h3><p>A CYP17 inhibitor indicated in combination with prednisone for the treatment of patients with metastatic castration-resistant prostate cancer.</p>

02/02/2016 from L266
<h3 class="label-above">Date Added</h3><span class="date-display-single" property="dc:date" datatype="xsd:dateTime" content="2016-02-02T00:00:00-08:00">02/02/2016</span>  </div>

Tags: fromtitlewithh1h3classcagov
1条回答
网友
1楼 · 发布于 2024-06-23 18:48:46

请注意,网站受到incapsula防火墙的保护,以防止机器人和浏览器自动化。你知道吗

使用Selenium我们可以实现以下目标:

from selenium import webdriver
from bs4 import BeautifulSoup

browser = webdriver.Firefox()
url = 'https://oehha.ca.gov/chemicals/abiraterone-acetate'
sada = browser.get(url)
source = browser.page_source
soup = BeautifulSoup(source, 'html.parser')

title = soup.find('h1', {'class': 'title'})
print(title.text.strip())
details = soup.find(string='Occurence(s)/Use(s)').find_next('p').contents[0]
print(details)
date = soup.find('span', {'class': 'date-display-single'})
print(date.text)

browser.close()

输出:

Abiraterone acetate
A CYP17 inhibitor indicated in combination with prednisone for the treatment of patients with metastatic castration-resistant prostate cancer.
02/02/2016

相关问题 更多 >

    热门问题