在List[]上进行迭代并编写dict只使用第一个List元素

2024-09-27 02:24:09 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在使用Python和Selenium从HTML页面中提取数据。我正在选择一个<ul>元素，它有几个<li>子元素，这些子元素包含我想要的数据。但是，当通过使用.find_element_by_xpath()查询元素来迭代List[WebElement]并基于<div>的.text值编写dict时，我只得到第一个div的.text值

我尽可能地精简了python代码和HTML代码：

<html>
<head>
</head>
<body>
    <ul id="listUl">
        <li id="item1">
            <div>
                <div class="content">
                    <div class="titel">
                        <div class="item_titel">Hello World</div>
                    </div>      
                    <div class="key">
                        <div class="item_key">HELLO_WORLD</div>
                    </div>
                </div>
            </div>
        </li>
        <li id="item2">
            <div>
                <div class="content">
                    <div class="titel">
                        <div class="item_titel">Merry Christmas</div>
                    </div>      
                    <div class="key">
                        <div class="item_key">MERRY_CHRISTMAS</div>
                    </div>
                </div>
            <div>
        </li>                                                       
    </ul>
</body>

from typing import List
from selenium import webdriver
from selenium.webdriver.remote.webelement import WebElement

path: str = "file:///C:/Users/<username>/Desktop/main3.html"
list_block = "//ul[@id='listUl']"
list_elements = "//li"

driver = webdriver.Firefox()
driver.get(path)

def get_data(list_item: WebElement) -> dict:
    return {
        'id': list_item.find_element_by_xpath("//div[@class='item_key']").text,
        'titel': list_item.find_element_by_xpath("//div[@class='item_titel']").text
    }

block_we: WebElement = driver.find_element_by_xpath(list_block)
result: List[dict] = []
block: WebElement = block_we
li_list: List[WebElement] = block.find_elements_by_xpath(list_elements)
for item in li_list:
    result.append(get_data(item))

print(result)   #[{'id': 'HELLO_WORLD', 'titel': 'Hello World'}, {'id': 'HELLO_WORLD', 'titel': 'Hello World'}]

我发现这样的帖子：Filling a python dictionary in for loop returns same values 所以我想，也许我没有创建一个新的dict，而第一个条目总是被重复使用。因此，我为每个条目创建了一个单独的变量：

item1 = {   # item1: {'id': 'HELLO_WORLD', 'titel': 'Hello World'}
    'id': li_list[0].find_element_by_xpath("//div[@class='item_key']").text,
    'titel': li_list[0].find_element_by_xpath("//div[@class='item_titel']").text
}
item1_text = li_list[0].text    #item1_text: 'Hello World\nHELLO_WORLD'
item2 = {   # item2: {'id': 'HELLO_WORLD', 'titel': 'Hello World'}
    'id': li_list[1].find_element_by_xpath("//div[@class='item_key']").text,
    'titel': li_list[1].find_element_by_xpath("//div[@class='item_titel']").text
}
item2_text = li_list[1].text    # item2_text: 'Merry Christmas\nMERRY_CHRISTMAS

有人能告诉我我犯了什么错误吗

编辑：为了确保Xpath没有错误，我将相对值//div[@class='item_key']和//div[@class='item_titel']更改为绝对值//div/div/div[1]/div和//div/div/div[2]/div，并在get_data的结果中添加了一个html属性：

def get_data(list_item: WebElement) -> dict:
return {
    'id': list_item.find_element_by_xpath("//div/div/div[1]/div").text,
    # 'id': list_item.find_element_by_xpath("//div[@class='item_key']").text,
    'titel': list_item.find_element_by_xpath("//div/div/div[2]/div").text,
    # 'titel': list_item.find_element_by_xpath("//div[@class='item_titel']").text,
    'text': list_item.text,
    'html': list_item.get_attribute("innerHTML").replace('\t', '').replace('\n', '')
}

输出：

[
    {
        'id': 'Hello World', 
        'titel': 'HELLO_WORLD', 
        'text': 'Hello World\nHELLO_WORLD', 
        'html': '<div><div class="content"><div class="titel"><div class="item_titel">Hello World</div></div><div class="key"><div class="item_key">HELLO_WORLD</div></div></div></div>'
    }, 
    {
        'id': 'Hello World', 
        'titel': 'HELLO_WORLD', 
        'text': 'Merry Christmas\nMERRY_CHRISTMAS', 'html': '<div><div class="content"><div class="titel"><div class="item_titel">Merry Christmas</div></div><div class="key"><div class="item_key">MERRY_CHRISTMAS</div></div></div></div>'
    }
]

但是，如果myList[WebElements]中有超过1<li>个元素，则通过.find_element_by_xpath("//div/div/div[1]/div")和.find_element_by_xpath("//div/div/div[2]/div")查询只返回元素0。即使我使用block.find_elements_by_xpath(list_elements)[0]或block.find_elements_by_xpath(list_elements)[1]或仅使用索引1调用get_data函数，使用绝对顺序相对xpath的.find_element_by_xpath()也只返回第一个元素的值

当更改HTML文件以便只有第二个<li>保留在文件中时，.find_element_by_xpath()函数返回（以前）第二个，现在是第一个元素的titel和key。将（以前）第一个元素放在第二个元素之后（切换）两个元素时，get_data的结果现在被翻转（id和titelshow element#1和ony#1）

Tags： key text div id 元素 hello by li

1条回答

网友

1楼 · 发布于 2024-09-27 02:24:09

解决了

问题是我忘了在get_data()函数中向XPath查询添加.。这篇文章描述了同样的问题：Iterating through elements get repeating result on Selenium on Python

如果我不添加.，XPath将从DOM顶部搜索，并始终返回相同的项

谢谢大家!

在List[]上进行迭代并编写dict只使用第一个List元素

相关问题更多 >

编程相关推荐

热门问题

热门文章

在List[]上进行迭代并编写dict只使用第一个List元素

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >