我有一个我的HTML,如下所示。我想得到<span class="zzAggregateRatingStat">
中的文本。根据下面给出的示例,我可以得到3和5。在
对于这项工作,我使用Python2.7&lxml
<div class="pp-meta-review">
<span class="zrvwidget" style="">
<span g:inline="true" g:type="NumUsersFoundThisHelpful" g:hideonnoratings="true" g:entity.annotation.groups="maps" g:entity.annotation.id="http://maps.google.com/?q=Central+Kia+of+Irving++(972)+659-2204+loc:+1600+East+Airport+Freeway,+Irving,+TX+75062&gl=US&sll=32.83624,-96.92526" g:entity.annotation.author="AIe9_BH8MR-1JD_4BhwsKrGCazUyU5siqCtjchckDcg5BAl5rOLd9nvhJJDTrtjL-xFI8D42bD_7">
<span class="zzNumUsersFoundThisHelpfulActive" zzlabel="helpful">
<span>
<span class="zzAggregateRatingStat">3</span>
</span>
<span>
<span> </span>
out of
<span> </span>
</span>
<span>
<span class="zzAggregateRatingStat">5</span>
</span>
<span>
<span> </span>
people found this review helpful.
</span>
</span>
</span>
</span>
</div>
这是clearly documented at the lxml website
以下代码适用于您的输入:
它打印:
^{pr2}$我更喜欢使用
lxml
的xpath而不是cssselector,尽管它们都可以完成这项工作。在ChrisP的示例打印
3
,但如果在实际输入上运行它,则会出现错误:ChrisP的代码可以改为使用}。在
lxml.html.fromstring
,这是一个更为宽松的解析器,而不是{如果进行了此更改,它将打印
3
。在相关问题 更多 >
编程相关推荐