带getall（）的Scrapy get xPath属性

2024-09-27 04:19:58 发布

2034

男 | 程序猿一只，喜欢编程写python代码。

我正在使用Scrapy构建ID列表（稍后将在URL中使用该列表来获取更多数据）：

def parse(self, response):
    for a in response.xpath('//a[@class="imageLink"]').getall():  
        print(a)
        item = NgaItem1()
        item["itemId"] = a.attrib["assetid"]
        yield item

我相信我正确地选择了DOM元素，因为print(a)会为我感兴趣的每个元素返回以下内容：

<a class="imageLink" id="assetLink_A_148957" assetid="148957" assettype="A" rel=""><img style="max-width:128px;max-height:128px;" class="mainThumbImage imageDraggable" alt="" title="George Catlin - The White Cloud, Head Chief of the Iowas - 1844/1845 - Painting" rel="" offset="" onmousedown="
                                                        noclear = 1; noclear=0;
                                                " id="grid-item_A_148957" assetid="148957" src="https://images.nga.gov//assets/thumbnails/497/7/5a7e73ae456e734fe2eaf4a0a71f0e3d.jpg"></a>

我所需要的就是assetid 148957。我得到的错误是'str' object has no attribute 'attrib'

Tags： id 元素列表 response item max class rel

1条回答

网友

1楼 · 发布于 2024-09-27 04:19:58

这不是一个理想的答案，但我最终使用了字符串操作。有些ID是5位数，有些是6位数，所以我在Excel中做了一些清理

def parse(self, response):
    for a in response.xpath('//a[@class="imageLink"]').getall():  
        start = a.find('assetid')
        item = NgaItem1()
        item["itemId"] = a[start+9:start+15]
        print(item["itemId"])
        yield item

带getall（）的Scrapy get xPath属性

相关问题更多 >

编程相关推荐

热门问题

热门文章

带getall（）的Scrapy get xPath属性

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >