<p>这真的取决于你需要如何刮网站,如何和什么样的数据,你想得到。</p>
<p>下面是一个如何在易趣上使用<code>Scrapy</code>+<code>Selenium</code>进行分页的示例:</p>
<pre><code>import scrapy
from selenium import webdriver
class ProductSpider(scrapy.Spider):
name = "product_spider"
allowed_domains = ['ebay.com']
start_urls = ['http://www.ebay.com/sch/i.html?_odkw=books&_osacat=0&_trksid=p2045573.m570.l1313.TR0.TRC0.Xpython&_nkw=python&_sacat=0&_from=R40']
def __init__(self):
self.driver = webdriver.Firefox()
def parse(self, response):
self.driver.get(response.url)
while True:
next = self.driver.find_element_by_xpath('//td[@class="pagn-next"]/a')
try:
next.click()
# get the data and write it to scrapy items
except:
break
self.driver.close()
</code></pre>
<p>以下是一些“硒蜘蛛”的例子:</p>
<ul>
<li><a href="https://stackoverflow.com/questions/10648644/executing-javascript-submit-form-functions-using-scrapy-in-python">Executing Javascript Submit form functions using scrapy in python</a></li>
<li><a href="https://gist.github.com/cheekybastard/4944914" rel="noreferrer">https://gist.github.com/cheekybastard/4944914</a></li>
<li><a href="https://gist.github.com/irfani/1045108" rel="noreferrer">https://gist.github.com/irfani/1045108</a></li>
<li><a href="http://snipplr.com/view/66998/" rel="noreferrer">http://snipplr.com/view/66998/</a></li>
</ul>
<hr/>
<p>还有一种方法可以替代使用<code>Selenium</code>和<code>Scrapy</code>。在某些情况下,使用<a href="https://github.com/scrapinghub/scrapy-splash" rel="noreferrer">^{<cd5>} middleware</a>就足以处理页面的动态部分。实际使用示例:</p>
<ul>
<li><a href="https://stackoverflow.com/questions/30345623/scraping-dynamic-content-using-python-scrapy">Scraping dynamic content using python-Scrapy</a></li>
</ul>