作为脚本运行ScrapySpider并不能获得所有代码，但来自项目的ScrapySpider可以

import scrapy import json from scrapy.crawler import CrawlerProcess class MySpider(scrapy.Spider): name = "target" start_urls = ['https://www.target.com/p/madden-nfl-22-xbox-one-series-x/-/A-83744898#lnk=sametab'] def parse(self, response): jsData = json.loads(response.xpath('//script[@type="application/ld+json"]//text()').extract_first()) NAME_SELECTOR = jsData['@graph'][0] yield { 'name': NAME_SELECTOR, } process = CrawlerProcess() process.crawl(MySpider) process.start()

...'offers': {'@type': 'Offer', 'priceCurrency': 'USD', 'availability': 'InStock', 'availableDeliveryMethod': 'ParcelService', 'potentialAction': {'@type': 'BuyAction'}, 'url': 'https://www.target.com/p/madden-nfl-22-xbox-one-series-x/-/A-83744898'}}}

import scrapy import json class targetSpider(scrapy.Spider): name = "target" start_urls = ['https://www.target.com/p/madden-nfl-22-xbox-one-series-x/-/A-83744898#lnk=sametab'] def parse(self, response): jsData = json.loads(response.xpath('//script[@type="application/ld+json"]//text()').extract_first()) test = jsData['@graph'][0] yield { 'test': test }

...'offers': {'@type': 'Offer', 'price': '59.99', 'priceCurrency': 'USD', 'availability': 'PreOrder', 'availableDeliveryMethod': 'ParcelService', 'potentialAction': {'@type': 'BuyAction'}, 'url': 'https://www.target.com/p/madden-nfl-22-xbox-one-series-x/-/A-8 3744898'}}}

1条回答

网友

1楼 · 发布于 2024-06-24 12:22:33

这是关于javascript的。像'price': '59.99'这样的内容由javascript加载。默认情况下，Scrapy中的Downloader不支持运行javascript

你问题的可能原因

您的一个spider启用了一些外部下载中间件（如Selenium、Splash、Playwright），而另一个没有
以CrawlerProcess()开头的脚本未在项目根目录下运行，这导致settings.py无法加载

更新：抱歉，我忘了在使用CrawlerProcess()时需要手动加载设置Run scrapy from a script

相关问题更多 >

编程相关推荐

热门问题

热门文章