为什么我得不到字段的值而不是字段本身?

2024-10-01 00:27:35 发布

您现在位置:Python中文网/ 问答频道 /正文

所以我第一次尝试用BeautifulSoup和Python来做web抓取。我要刮的页面位于:http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172

client = request('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172')
page_html = client.read()
client.close()
page_soup = soup(page_html)

identification = page_soup.find('div', {'data-bind':'text: name'})
print(identification.text)

当我这样做的时候,我只得到一个空字符串。如果我只打印得到的标识变量:

<div class="col-xs-7" data-bind="text: name"></div>

This is the line of html that I am trying to get the value of, as you can see there is a value A LEBLANC there in the tag


Tags: thetextdivcomclienthttphtmlpage
2条回答

有几种方法可以实现相同的目标。然而,我在我的脚本中使用了选择器,它很容易理解,除非网站的html结构发生重大变化,否则很少有机会被破坏。也试试这个

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
driver.get('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172')
soup = BeautifulSoup(driver.page_source,"lxml")
driver.quit()
item_name = soup.select("[data-bind$='name']")[0].text
print(item_name)

结果:

A LEBLANC

顺便说一句,你开始的方式也会起作用:

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
driver.get('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172')
soup = BeautifulSoup(driver.page_source,"lxml")
driver.quit()
item_name = soup.find('div', {'data-bind':'text: name'}).text
print(item_name)

您可以尝试以下代码:

from selenium import webdriver

driver=webdriver.Chrome()

browser=driver.get('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172')

find=driver.find_element_by_xpath('//*[@id="identificationCollapse"]/div/div/div/div[1]/div[1]/div[2]')

print(find.text)

输出:

A LEBLANC

相关问题 更多 >