Xpath获取具有特定字符串及其所有后续同级的标记，直到另一个特定字符串出现在标记中

Article 1. To approve the master plan on development of tourism in Northern Central Vietnam with the following principal contents: 1. Development viewpoints To realize general viewpoints of the strategy for and master plan on development of Vietnam’s tourism through 2020. PRIME MINISTER: Nguyen Tan Dung PRIME MINISTER Article 2. ................. PRIME MINISTER: Nguyen Tan Dung

[ 'Article 1.' , 'To approve the master plan on development of tourism in Northern Central Vietnam with the following principal contents: ', '1. Development viewpoints' , 'To realize general viewpoints of the strategy for and master plan on development of Vietnam’s tourism through 2020.' , 'PRIME MINISTER: Nguyen Tan Dung', 'PRIME MINISTER' ]

3条回答

网友

1楼 · 编辑于 2024-10-06 12:37:40

此xpath表达式：

//p[descendant-or-self::p and (following-sibling::p/descendant::b)]

至少在发布的html代码上，应该可以得到预期的输出。你知道吗

网友

2楼 · 编辑于 2024-10-06 12:37:40

下面是与OP中的确切需求相匹配的xpath

//span[normalize-space(.)='Article 1.']/ancestor::p|//p[//span[normalize-space(.)='Article 1.']]/following::*[count(following-sibling::p/span/b[normalize-space(.)='PRIME MINISTER'])=1]

截图：

网友

3楼 · 编辑于 2024-10-06 12:37:40

在XPath中，“Until”和“Between”查询出奇地困难，即使XPath版本比1.0更高。你知道吗

如果我们从更高版本开始工作，那么在XPath 3.1中，您可以执行以下操作：

let $first := p[contains(., 'Article 1')],
    $last := p[contains(., 'PRIME MINISTER']
return $first, p[. >> $first and . << $last], $last

在XPath2.0中我们没有let，但是for也可以工作，只是读起来有点奇怪。你知道吗

但是在1.0中（a）我们不能绑定变量，并且（b）我们没有<<和>>操作符，这使得它更加困难。你知道吗

最简单的表达可能是

p[(.|preceding-sibling::p)[contains(., 'Article 1')] and 
  (.|following-sibling::p)[contains(., 'PRIME MINISTER')]]

不幸的是，如果没有一个非常智能的优化器，对于一个大的输入文档来说，这可能是非常低效的（两个contains（）测试都将执行大约（N^2）/2次，其中N是段落数）。如果您受限于XPath1.0，那么最好使用XPath查找“开始”和“结束”节点，然后使用宿主语言查找中间的所有节点。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章