当我用刮痧爬行，没有错误出现，但爬行什么也没有

from openbl.items import OpenblItem import scrapy import time class OpenblSpider(scrapy.Spider): name='openbl' start_url=['http://www.openbl.org/lists/base_1days.txt'] def parse(self, response): #get the content within 'pre', select the 1st element to get the content string. #split the space of the content content=response.xpath('/pre/text').extract()[0].split() # This for loop is used to get the num of element in list content # after which the elements of the list are the IPs we desire. for i in range(0,len(content)): if content[i]=='ip': i+=1 break else: pass # construct a new list content_data for putting IPs in. content_data=[] # This for loop put useful data(IPs) into the new list above. for x in range(i,len(content)): content_data.append(content(i)) for cont in content_data: item=OpenblItem() item['name']=cont item['date']=time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())) item['type']='other' yield item

1条回答

网友

1楼 · 发布于 2024-10-02 00:37:48

首先，这个代码有几个问题

start_urls否则我认为它不会抓取任何东西

您得到的错误是因为它是一个纯文本文件响应.正文是纯文本。没有标签。所以你得到了一个索引越界异常。您可以简单地将其处理为纯文本，并通过正则表达式提取信息，通过\ns等进行拆分。在

另外，不要使用循环变量，我喜欢这样。这感觉不对。如果你想找到列表中某物第一次出现的索引，有index()函数。或者 What is the best way to get the first item from an iterable matching a condition?

相关问题更多 >

编程相关推荐

热门问题

热门文章