Scrapy spid中xpath/regex出现问题

2024-10-02 04:33:26 发布

您现在位置：Python中文网/ 问答频道 /正文

8313

网友

男 | 程序猿一只，喜欢编程写python代码。

我试图从“previous sibling”中的onclick标记中提取产品id，这是一个ul标记（id=“ShowProductImages”）。在

我要提取的号码是在后面？pid=，示例：

…列出/查看全部？pid=234565图像=206。。。在

以下是我试图从中提取的内容：

<ul id="ShowProductImages" class="imageView">
    <li><a href="" target="_blank" onClick="javascript:initWindow('http://products.example.com/products/list/ViewAll?pid=234565&amp;image=754550',520,520,100,220);return false;"><img src="http://content.example.com/assets/images/products/j458jk.jpg" width="200" height="150" alt="Product image description here" border="0"></a></li>        
</ul>

<div class="description">
    Description here...
</div>

我使用xpath选择onclick标记和一个正则表达式来提取id

^{pr2}$

有什么建议吗？我不太清楚我哪里出错了。在

提前谢谢你的帮助。在

Tags：标记 image com id http here example li

1条回答

网友

1楼 · 发布于 2024-10-02 04:33:26

我建议您尝试这样做，从ul中选择，并在谓词中测试其<div class="description">同级：

sel.xpath("""//ul[following-sibling::div[@class="description"]]
                 [@id="ShowProductImages"]
                 /li/a[1]/@onclick""").re(r'(?:pid=)(\d+)')

我把你的正则表达式改为限制为数字。在

Scrapy spid中xpath/regex出现问题

相关问题更多 >

编程相关推荐

热门问题

热门文章

Scrapy spid中xpath/regex出现问题

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >