擅长:python、mysql、java
<p>正如@eLRuLL提到的,分页div id是动态生成的。你知道吗</p>
<p>必须使用一些驱动程序在页面上呈现javascript,比如无头浏览器。或者<a href="https://docs.scrapy.org/en/latest/topics/dynamic-content.html#pre-rendering-javascript" rel="nofollow noreferrer">scrapy-recommended</a>飞溅。你知道吗</p>
<pre><code>from scrapy_splash import SplashRequest
...
for p in data['product_page']:
yield SplashRequest(p,
callback=self.parse_product,
args={'wait': 0,5},
endpoint = 'render.html')
</code></pre>
<p>使用selectorlib可以使用xpath选择器,它包含“pagination next”。你知道吗</p>
<p><a href="https://selectorlib.readthedocs.io/en/latest/usage.html#xpath-default-blank" rel="nofollow noreferrer">https://selectorlib.readthedocs.io/en/latest/usage.html#xpath-default-blank</a></p>
<p>你知道吗URL选择器.yml你知道吗</p>
<pre><code>product_page:
css: a.a-size-base
multiple: true
type: Link
next:
xpath: '//div[contains(@id, "pagination-next")]//li[@class="a-last"]/a/@href'
type: Link
</code></pre>