<p>提前谢谢你的帮助。我是Python的新手,正在尝试如何使用线程模块从NY每日新闻站点获取url。我把下面的内容放在一起,这个脚本正在报废,但是它似乎没有比以前快,所以我不确定线程是否正在进行。如果是的话,你能告诉我吗?我能写些什么让我知道吗?还有其他关于穿线的建议吗?在</p>
<p>谢谢。在</p>
<pre><code>from bs4 import BeautifulSoup, SoupStrainer
import urllib2
import os
import io
import threading
def fetch_url():
for i in xrange(15500, 6100, -1):
page = urllib2.urlopen("http://www.nydailynews.com/search-results/search-results-7.113?kw=&tfq=&afq=&page={}&sortOrder=Relevance&selecturl=site&q=the&sfq=&dtfq=seven_years".format(i))
soup = BeautifulSoup(page.read())
snippet = soup.find_all('h2')
for h2 in snippet:
for link in h2.find_all('a'):
logfile.write("http://www.nydailynews.com" + link.get('href') + "\n")
print "finished another url from page {}".format(i)
with open("dailynewsurls.txt", 'a') as logfile:
threads = threading.Thread(target=fetch_url())
threads.start()
</code></pre>