擅长:python、mysql、java
<p>我们可以使用以下方法:</p>
<pre><code>request = Request(url="http://example.com")
request.meta['proxy'] = "host:port"
yield request
</code></pre>
<p>一个简单的实现如下:</p>
^{pr2}$
<p>如果要在初始状态下使用代理:</p>
<p>添加以下作为spider类字段</p>
<pre><code>class MySpider(scrapy.Spider):
name = "examplespider"
allowed_domains = ["somewebsite.com"]
start_urls = ['http://somewebsite.com/']
custom_settings = {
'HTTPPROXY_ENABLED': True
}
</code></pre>
<p>然后使用<code>start_requests()</code>方法,如下所示:</p>
<pre><code> def start_requests(self):
urls = ['example.com']
for url in urls:
proxy = 'some proxy'
yield scrapy.Request(url=url, callback=self.parse, meta={'proxy': proxy})
def parse(self, response):
item = StatusCehckerItem()
item['url'] = response.url
return item
</code></pre>