Xpath碎片结果不符合预期

2024-09-25 08:37:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试获取前面标记的值。以下是我正在做的:

html页面的结构:

...
<tr class="destaque no-hover">
    <td class="periodo" colspan="6">2020.1</td>
</tr>
<tr class="linhaPar">
    <td>Text1</td>
    <td align="center">01</td>
    <td align="right">312h</td>
    <td align="center">3T12</td>
</tr>
<tr class="linhaImpar">
    <td>Text2</td>
    <td align="center">01</td>
    <td align="right">12h</td>
    <td align="center">5M12</td>
</tr>
...
<tr class="destaque no-hover">
    <td class="periodo" colspan="6">2016.1</td>
</tr>
<tr class="linhaPar">
    <td>Text7</td>
    <td align="center">01</td>
    <td align="right">2h</td>
    <td align="center">2N12</td>
</tr>
<tr class="linhaImpar">
    <td>Text8</td>
    <td align="center">01</td>
    <td align="right">32h</td>
    <td align="center">4T12</td>
</tr>
...
<tr class="destaque no-hover">
    <td class="periodo" colspan="6">2014.2</td>
</tr>
<tr class="linhaPar">
    <td>TextN-1</td>
    <td align="center">01</td>
    <td align="right">2h</td>
    <td align="center">2N12</td>
</tr>
<tr class="linhaImpar">
    <td>TextN</td>
    <td align="center">01</td>
    <td align="right">32h</td>
    <td align="center">4T12</td>
</tr>

所以,我试图获得这些tr classes="linhaPar|linhaImpar"中每一个的信息

for i in response.xpath('//tr[@class="linhaPar" or @class="linhaImpar"]')
    _aux = i.xpath('./td[1]')

但是,我也需要这些td[@class="periodo"],所以我对xpath很感兴趣

# I've tried this, but return a list of elements that matches, not the close one, as I want
    _p = _aux.xpath('./preceding::tr[td[@class="periodo"]')

# I've also tried this, but won't work
    _p = _aux.xpath('./preceding::tr[td[@class="periodo"] and position()=1]')

已解决

也许当我提出这个问题时,我还不够清楚。不同数量的tr放在一起时periodo会发生变化。我尝试搜索的每一种方法,都会返回一个可能的结果列表或nada。为了解决这个问题,我已经尝试了在“for循环xPath”< EEM>:

中考虑^ {CD3>}的解决方案。
_p = ""
for i in response.xpath('//tr[@class="linhaPar" or @class="linhaImpar" or @class="destaque no-hover"]'):
    # Check if it's a td with period
    if 'destaque no-hover' == i.xpath('./@class').get():
        _p = i.xpath('./td/text()').get()
        continue # Force to go to the next one

Tags: norightforxpathtrclasstdcenter
2条回答

此XPath:

'//tr[@class="linhaPar" or @class="linhaImpar" or td[@class="periodo"]]' 

假设您希望将其存储在_p(每个tr上下文节点一个periodo):

['2020.1'], ['2020.1'], ['2020.1'], ['2020.1']

使用:

./preceding::td[@class="periodo"][1]

假设您希望将其存储在_p(每组数据一个periodo):

['2020.1'], [], ['2020.2'], []

使用:

./preceding-sibling::tr[1]/td[1][@class="periodo"]

如果需要从创建的列表中删除空元素,请在之后使用filter进行删除

对于第二种情况,正如@Gilles Quenot所述,您还可以为以下内容更改上下文节点:

//tr[@class="linhaPar" or @class="linhaImpar" or @class="destaque no-hover"]

并在列表中填写:

_aux = ./td[1][not(@class="periodo")]
_p = ./td[1][@class="periodo"]

或:

_aux = ./td[1][not(starts-with(text(),"2020."))]
_p = ./td[1][starts-with(text(),"2020.")]

相关问题 更多 >