我如何循环浏览此页面上的下拉菜单来浏览每个产品的规格和价格?

2024-10-04 11:28:09 发布

您现在位置:Python中文网/ 问答频道 /正文

嗨,我对Python和Web抓取比较陌生。我正在尝试从该页(https://www.jmesales.com/kuriyama-3-4-in-brass-quick-couplings/)下拉菜单中的每个产品选项中提取数据。我相信页面不使用JavaScript,我更愿意使用请求和BeautifulSoup,而不是webdriver。我有代码可以获取每个选项的名称和属性值,但我不确定如何访问与每个选项关联的定价和规格数据。这是我的代码:

from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests

url = 'https://www.jmesales.com/kuriyama-3-4-in-brass-quick-couplings/'

headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"}
res = requests.get(url, headers=headers)

soup = BeautifulSoup(res.text,'lxml')

options = [item['value'] for item in soup.select('#attribute_select_42800 option')]

for option in options:
    print(option)

我想访问每个选项的价格和相关数据。任何帮助都将不胜感激


Tags: 数据inhttpsimportcomwww选项quick
3条回答

希望这将有助于:

from pyautogui import typewrite

amount_of_options = 4 # Amount of options in the menu
typewrite(['enter']) # Click on the dropdown menu

for i in range(amount_of_options):
    typewrite(['tab']) # Each tab will navigate to the next option in the menu

尝试类似的方法:

from bs4 import BeautifulSoup
import requests

url = 'https://www.jmesales.com/kuriyama-3-4-in-brass-quick-couplings/'
s = requests.Session()
headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"}
res = s.get(url, headers=headers)

soup = BeautifulSoup(res.text,'lxml')

options = [[item['value'], name.text] for item, name in zip(soup.select('#attribute_select_42800 option'), soup.select('#attribute_select_42800 option'))]



id = soup.select_one('input[name^="product_id"]').get('value')

for option in options[1:]:
    item_num, item_name = option
    data = {'action': 'add', 'attribute[42800]': item_num, 'product_id': id, 'qty[]': '1'}
    product = s.post('https://www.jmesales.com/remote/v1/product-attributes/53564', data=data).json()
    price = product['data']['price']['without_tax']['formatted']

    print(f'Item name: {item_name} Item price: {price}')

印刷品:

Item name: Part A Female NPT x Male Adapter Item price: $6.30
Item name: Part B Female Coupler x Male NPT Item price: $13.80
Item name: Part C Female Coupler x Hose Shank Item price: $11.50
Item name: Part D Female Coupler x Female NPT Item price: $12.80
Item name: Part E Male Adapter x Hose Shank Item price: $8.50
Item name: Part F Male NPT x Male Adapter Item price: $7.30
Item name: Dust Cap Item price: $11.00
Item name: Dust Plug Item price: $8.10

以上代码仅以您拥有的特定url为例,可以解析多个url:

url = 'https://www.jmesales.com/dixon-brass-female-ght-x-female-npt-adapter-lead-free/'
s = requests.Session()
headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"}
res = s.get(url, headers=headers)

soup = BeautifulSoup(res.text,'lxml')

attrid = re.findall('\[([\d]+)\]', soup.select_one('.form-select.form-select small').get('name'))[0]

options = [[item['value'], name.text] for item, name in zip(soup.select(f'#attribute_select_{attrid} option'), soup.select(f'#attribute_select_{attrid} option'))]


id = soup.select_one('input[name^="product_id"]').get('value')

for option in options[1:]:
    item_num, item_name = option
    data = {'action': 'add', f'attribute[{attrid}]': item_num, 'product_id': id, 'qty[]': '1'}
    product = s.post(f'https://www.jmesales.com/remote/v1/product-attributes/{id}', data=data).json()
    price = product['data']['price']['without_tax']['formatted']

    print(f'Item name: {item_name} Item price: {price}')

这不是你一直在寻找的答案,但对于网络垃圾,我建议使用硒

https://selenium-python.readthedocs.io/

只要打开浏览器,你就可以做任何你想做的事情。我要做的是查找XPath并查找要迭代的模式

相关问题 更多 >