尝试从scrip中的另一个位置运行一个垃圾爬虫

from twisted.internet import reactor from scrapy.crawler import Crawler from scrapy.settings import Settings from scrapy import log from GetAdUrlsFromIndex.spiders.GetAdUrls_spider import getadurls spider = getadurls(domain='website.com') crawler = Crawler(Settings()) crawler.configure() crawler.crawl(spider) crawler.start() log.start() reactor.run() # the script will block here

1条回答

网友

1楼 · 发布于 2024-06-25 07:01:41

如果您在C:\Python27\Scripts\GetAdUrlsFromIndex_project\GetAdUrlsFromIndex和C:\Python27\Scripts\GetAdUrlsFromIndex_project\GetAdUrlsFromIndex\spiders中有{}，那么尝试用这种方式修改脚本

import sys
from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy.settings import Settings
from scrapy import log

sys.path.append('C:/Python27/Scripts/GetAdUrlsFromIndex_project')
from GetAdUrlsFromIndex.spiders.GetAdUrls_spider import getadurls

spider = getadurls(domain='website.com')
crawler = Crawler(Settings())
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run() # the script will block here

相关问题更多 >

编程相关推荐

热门问题

热门文章