回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我有一个问题,我刮一个子页面与链接,我在主页上获得。你知道吗</p>
<p>每个漫画都有自己的页面,所以我试着打开每一个项目的页面,并刮价。你知道吗</p>
<p>这是蜘蛛:</p>
<pre class="lang-py prettyprint-override"><code>class PaniniSpider(scrapy.Spider):
name = "spiderP"
start_urls = ["http://comics.panini.it/store/pub_ita_it/magazines.html"]
def parse(self, response):
# Get all the <a> tags
for sel in response.xpath("//div[@class='list-group']//h3/a"):
l = ItemLoader(item=ComicscraperItem(), selector=sel)
l.add_xpath('title', './text()')
l.add_xpath('link', './@href')
request = scrapy.Request(sel.xpath('./@href').extract_first(), callback=self.parse_isbn, dont_filter=True)
request.meta['l'] = l
yield request
def parse_isbn(self, response):
l = response.meta['l']
l.add_xpath('price', "//p[@class='special-price']//span/text()")
return l.load_item()
</code></pre>
<p>问题是关于链接,输出类似于:</p>
<pre class="lang-sh prettyprint-override"><code>{"title": "Spider-Man 14", "link": ["http://comics.panini.it/store/pub_ita_it/mmmsm014isbn-it-marvel-masterworks-spider-man-marvel-masterworks-spider.html"], "price": ["\n \u20ac\u00a022,50 ", "\n \u20ac\u00a076,50 ", "\n \u20ac\u00a022,50 ", "\n \u20ac\u00a022,50 ", "\n \u20ac\u00a022,50 ", "\n \u20ac\u00a018,00
{"title": "Avenger di John Byrne", "link": ["http://comics.panini.it/store/pub_ita_it/momae005isbn-it-omnibus-avengers-epic-collecti-marvel-omnibus-avengers-by.html"], "price": ["\n \u20ac\u00a022,50 ", "\n \u20ac\u00a076,50 ", "\n \u20ac\u00a022,50
</code></pre>
<p>简而言之,请求传递每个项目的链接列表,因此价格不是唯一的,而是列表的结果。你知道吗</p>
<p>如何只传递相关项目的链接并存储每个项目的价格?你知道吗</p>