擅长:python、mysql、java
<p>检查以下代码段</p>
<pre><code>urls = ['http://domain1.com','http://domain1.com/page1','http://domain2.com']
crawl_for_urls = {}
for url in urls:
domain = base_url(url)
if domain not in crowl_for_urls:
crawl_for_urls.update({domain:url})
crawl(url)
</code></pre>
<p><code>crawl()</code>将仅为唯一域调用。你知道吗</p>
<p>或者您可以使用:</p>
<pre><code>urls = ['http://domain1.com','http://domain1.com/page1','http://domain2.com']
crawl_for_urls = {}
for url in urls:
domain = base_url(url)
if domain not in crowl_for_urls:
crawl_for_urls.update({domain:[url]})
crawl(url)
else:
crawl_for_urls.get(domain, []).append(url)
</code></pre>
<p>通过这种方式,您可以根据域对URL进行分类,也可以使用<code>crawl()</code>作为唯一域。你知道吗</p>