通过Python BeautifulSoup删除下拉菜单值

2024-10-02 10:33:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我查看了大部分帖子,但没有找到我的小数量的回复

这是我想刮取的下拉列表:

<div class="input-box">
    <select name="super_attribute[138]" id="attribute138" class="required-entry super-attribute select form-control" onchange="notifyMe(this.value, this.options[this.selectedIndex].innerHTML);">
        <option value="">Choose an Option...</option>
        <option value="17" price="0">M (in stock) </option>
        <option value="18" price="0">L (out of stock) </option>
        <option value="15" price="0">XL (in stock) </option>
        <option value="52" price="0">XXL (in stock) </option>
    </select>
</div>

我的Python代码是:

items = soup.select('option[value]')
values = [item.get('value') for item in items]
textvalues = [item.text for item in items]

print(textvalues)

输出为: [‘选择’、‘库存’、‘缺货’、‘库存’、‘库存’、‘库存’]

我的要求是我还需要其他值(SizeValue和SizeName): 17及;M/18&;L/15及;XL/52及;XXL

如果我删除了.text,我有以下输出:

   <option value="">select</option>, <option value="200@#-(In-Stock)@#-https://store.alsabihmarine.com/index.php/diving-equipments/wetsuits/camouflage-hooded-suits-220.html@#-">(In-Stock)</option>, <option value="201@#-(Out-Stock)@#-https://store.alsabihmarine.com/index.php/diving-equipments/wetsuits/camouflage-hooded-suits-220.html@#-">(Out-Stock)</option>, <option value="202@#-(In-Stock)@#-https://store.alsabihmarine.com/index.php/diving-equipments/wetsuits/camouflage-hooded-suits-220.html@#-">(In-Stock)</option>, <option value="203@#-(In-Stock)@#-https://store.alsabihmarine.com/index.php/diving-equipments/wetsuits/camouflage-hooded-suits-220.html@#-">(In-Stock)</option>

提前谢谢你的帮助


Tags: storeinhttpscomindexvaluestock库存
2条回答

这很简单,只需添加一个+并在列表中调用item.text

而不是:

values = [item.get('value') for item in items]

使用:

values = [item.get('value') + item.get_text(strip=True) for item in items[1:]]
print(values)

编辑:数据是动态加载的,因此requests不支持它。但网站上提供了JSON格式的数据。您可以使用re模块使用正则表达式提取它:

import json
import re
import requests


url = "https://store.alsabihmarine.com/index.php/diving-equipments/wetsuits/camouflage-hooded-suits-220.html"
response = requests.get(url).content

regex_pattern = re.compile(r"Product\.Config\(({.*?})\);")
data = json.loads(regex_pattern.search(str(response)).group(1))

print(
    [
        product["id"] + product["label"]
        for product in data["attributes"]["138"]["options"]
    ]
)

输出:

['17M (in stock) ', '18L (out of stock) ', '15XL (in stock) ', '52XXL (in stock) ']

谢谢你的回复。 是的,我试过了,但问题是输出不同,如下所示:

['200@#-(In-Stock)@#-https://store.alsabihmarine.com/index.php/diving-equipments/wetsuits/camouflage-hooded-suits-220.html@#-(In-Stock)', '201@#-(Out-Stock)@#-https://store.alsabihmarine.com/index.php/diving-equipments/wetsuits/camouflage-hooded-suits-220.html@#-(Out-Stock)', '202@#-(In-Stock)@#-https://store.alsabihmarine.com/index.php/diving-equipments/wetsuits/camouflage-hooded-suits-220.html@#-(In-Stock)', '203@#-(In-Stock)@#-https://store.alsabihmarine.com/index.php/diving-equipments/wetsuits/camouflage-hooded-suits-220.html@#-(In-Stock)']

相关问题 更多 >

    热门问题