使用XPath获取第二个元素文本？

网友

1楼 · 编辑于 2024-09-30 22:09:50

我不确定是什么问题。。。

>>> d = """<span class='python'>
...   <a>google</a>
...   <a>chrome</a>
... </span>"""
>>> from lxml import etree
>>> d = etree.HTML(d)
>>> d.xpath('.//span[@class="python"]/a[2]/text()')
['chrome']
>>>

网友

2楼 · 编辑于 2024-09-30 22:09:50

I tried this but it doesn't work.
t = item.findtext('.//span[@class="python"]//a[2]')

这是关于//缩写的常见问题解答。

.//a[2]表示：选择当前节点的所有a子节点，这些子节点是其父节点的第二个a子节点。因此，这可能会选择多个元素或不选择元素——具体取决于具体的XML文档。

简单地说，[]运算符的优先级高于//。

如果只希望返回所有节点中的一个（第二个），则必须使用方括号强制执行所需的优先级：

(.//a)[2]

这实际上选择了当前节点的第二个a后代。

对于问题中使用的实际表达式，将其更改为：

(.//span[@class="python"]//a)[2]

或更改为：

(.//span[@class="python"]//a)[2]/text()

网友
3楼 · 编辑于 2024-09-30 22:09:50

来自评论：

or the simplification of the actual HTML I posted is too simple

你说得对。什么是.//span[@class="python"]//a[2]的意思？这将扩展到：

self::node()
 /descendant-or-self::node()
  /child::span[attribute::class="python"]
   /descendant-or-self::node()
    /child::a[position()=2]

它最终将选择第二个a子项（fn:position()指的是child轴）。因此，如果您的文档如下所示，则不会选择任何内容：

<span class='python'> 
  <span> 
    <span> 
      <img></img> 
      <a>google</a><!-- This is the first "a" child of its parent --> 
    </span> 
    <a>chrome</a><!-- This is also the first "a" child of its parent --> 
  </span> 
</span>

如果需要所有子体中的第二个子体，请使用：

descendant::span[@class="python"]/descendant::a[2]

相关问题更多 >

编程相关推荐

热门问题

热门文章