我正在尝试获取以下html代码的标题:
<FONT COLOR=#5FA505><B>Claim:</B></FONT> Coed makes unintentionally risqué remark about professor's "little quizzies."
<BR><BR>
<CENTER><IMG SRC="/images/content-divider.gif"></CENTER>
我试过使用:
^{pr2}$但是标题(Coed无意中做了…)实际上并没有嵌入到任何标签中,所以我无法真正地获得这些内容。有没有一种方法可以在不嵌入<p>
或任何类型的标记的情况下获取内容?在
EDIT://font[b = "Claim:"]/following-sibling::text()
可以工作,但它也可以抓取并显示html的底部部分。在
<FONT COLOR=#5FA505 FACE=""><B>Origins:</B></FONT> Print references to the "little quizzies" tale date to 1962, but the tale itself has been around since the early 1950s. It continues to surface among college students to this day. Similar to a number of other college legends
假设您事先知道有
Claim:
文本,那么根据b
子元素的文本定位font
标记,并获得following text sibling:来自Scrapy Shell的演示:
^{pr2}$注意,理想情况下,这些join和strip调用应该由Item Loaders中使用的适当的输入或输出处理器来代替。在
相关问题 更多 >
编程相关推荐