<p>我对使用Xpath非常陌生。我想从法律法规网站上提取一些信息,现在我只想:</p>
<ol>
<li>查找包含字符串“Article 1”的标记</li>
<li>从(1)中的标记开始获取它,以及之后的所有内容,直到其中一个标记在<code><b></code>标记中包含另一个字符串“PRIME”。你知道吗</li>
</ol>
<pre><code><p>
<b> <span> Article 1. </span> </b>
<span>
To approve the master plan on development
of tourism in Northern Central Vietnam
with the following principal contents:
</span>
</p>
<p>
<span>
1. Development viewpoints
</span>
</p>
<p>
<span>To realize general viewpoints of the strategy for and master plan on development of Vietnam’s tourism through 2020.
</span>
</p>
<p>
<span>PRIME MINISTER: Nguyen Tan Dung</span>
</p>
<p>
<span>
<b> PRIME MINISTER </b>
</span>
</p>
<p>
<b> <span> Article 2. </span> </b>
<span>
.................
</span>
</p>
<p>
<span> PRIME MINISTER: Nguyen Tan Dung</span>
</p>
</code></pre>
<p>对于预期的输出,我应该有一个类似于</p>
<pre><code>[
'Article 1.' ,
'To approve the master plan on development of tourism in Northern
Central Vietnam with the following principal contents: ',
'1. Development viewpoints' ,
'To realize general viewpoints of the strategy for and master plan on
development of Vietnam’s tourism through 2020.' ,
'PRIME MINISTER: Nguyen Tan Dung',
'PRIME MINISTER'
]
</code></pre>
<p>列表中的第一项是“第1条”,最后一项是在<code><b></code>标签中的“总理”</p>