美化组不返回子元素

2024-09-26 22:13:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我尝试过无数种不同的方法,但我不明白为什么Beautifulsoup和我所有的前任一样不可预测

我只是想把一个表复制到一个数据框中。桌子上大约有280行

以下是网址:

https://www.brilliantearth.com/design-your-own-engagement-ring/?sid=3755106&dc=

以下是我的部分代码不起作用:

with requests.Session() as s:
    url = "https://www.brilliantearth.com/design-your-own-engagement-ring/?sid=3755106&dc="
    r = s.get(url, headers=req_headers)

#add contents of urls to soup variable from each url
soup = BeautifulSoup(r.content, 'lxml')
rows = soup.find_all("div", {"id": "diamonds_search_table"})
rows

以下是url中的表格所在位置:

enter image description here

下一步我能试试什么


Tags: httpscomurlyourwwwdcrowsheaders
2条回答

数据通过JavaScript动态加载。您可以使用requests模块来模拟它

例如:

import json
import requests


search_parameters = {
'shapes':  "Round",
'cuts':    "Fair,Good,Very Good,Ideal,Super Ideal",
'colors':  "J,I,H,G,F,E,D",
'clarities':   "SI2,SI1,VS2,VS1,VVS2,VVS1,IF,FL",
'polishes':    "Good,Very Good,Excellent",
'symmetries':  "Good,Very Good,Excellent",
'fluorescences':   "Very Strong,Strong,Medium,Faint,None",
'min_carat':   "0.25",
'max_carat':  "11.58",
'min_table':   "50.00",
'max_table':   "86.00",
'min_depth':   "46.20",
'max_depth':   "629.00",
'min_price':   "420",
'max_price':   "1258930",
'stock_number':    "",
'row': "0",
'page':    "1",
'requestedDataSize':   "200",
'order_by':    "price",
'order_method':    "asc",
'currency':    "$",
'has_v360_video':  "",
'dedicated':   "",
'sid': "",
'min_ratio':   "1.00",
'max_ratio':   "2.75",
'shipping_day':    "",
'MIN_PRICE':   "420",
'MAX_PRICE':   "1258930",
'MIN_CARAT':   "0.25",
'MAX_CARAT':  "11.58",
'MIN_TABLE':   "45",
'MAX_TABLE':   "86",
'MIN_DEPTH':   "46.2",
'MAX_DEPTH':   "629"
}

data = requests.get('https://www.brilliantearth.com/loose-diamonds/list/', params=search_parameters).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for d in data['diamonds']:
    print('{:<30} {:<15} {}'.format(d['title'], d['cut'], d['price']))

印刷品:

0.30 Carat Round Diamond       Very Good       420
0.30 Carat Round Diamond       Very Good       420
0.30 Carat Round Diamond       Ideal           430
0.30 Carat Round Diamond       Ideal           430
0.30 Carat Round Diamond       Good            430
0.30 Carat Round Diamond       Ideal           430
0.30 Carat Round Diamond       Very Good       430
0.25 Carat Round Diamond       Super Ideal     430
0.30 Carat Round Diamond       Very Good       430
0.32 Carat Round Diamond       Ideal           430

... and so on.

您可以使用selenium解析html。您可以尝试:

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('https://www.brilliantearth.com/design-your-own-engagement-ring/?sid=3755106&dc=')

html = driver.page_source
soup = BeautifulSoup(html)


rows = soup.find_all("div", {"id": "diamonds_search_table"})
print(rows)

您将获得如下所示的所有行:

[<div class="search-table" id="diamonds_search_table" style="position: relative; height: 34000px;">
<div class="inner item" data-have="true" data-position="0" style="position: absolute; width: 100%; height: 34px;top:0px;"><a class="td-n2" href="/rings/cyorings/view_diamond/9361809/?sid=3755106&amp;first=diamond&amp;show_diamond_tab=true"></a><table border="0" cellpadding="0" cellspacing="0" class="table-striped table-hover search-result-table" width="100%"><tbody><tr class="search-item"><td data-id="9361809" onclick="dtl.stop_jump();" scope="col" width="7%"><div class="checkbox checkbox-ty4"><label><input class="hidden"/><span class="sr-only">checkbox</span><i class="icons-checkbox"></i></label></div></td><td scope="col" width="9%">Round</td><td scope="col" width="9%">0.30</td><td scope="col" width="8%">H</td><td scope="col" width="8%">SI2</td><td scope="col" width="12%">Very Good</td><td scope="col" width="8%">GIA</td><td scope="col" width="12%">Botswana Sort</td><td class="width_ratio_hide" scope="col" width="8%">1</td><td scope="col" width="10%">$420</td><td scope="col" width="7%"><span class="view">View</span></td></tr></tbody></table></div><div class="inner item" data-have="true" data-position="34" style="position: absolute; width: 100%; height: 34px;top:34px;"><a class="td-n2" href="/rings/cyorings/view_diamond/9391074/?sid=3755106&amp;first=diamond&amp;show_diamond_tab=true"></a><table border="0" cellpadding="0" cellspacing="0" class="table-striped table-hover search-result-table" width="100%"><tbody><tr class="search-item"><td data-id="9391074"


and so on...........]

相关问题 更多 >

    热门问题