<p>您可以使用多线程执行多个类似的并行请求:</p>
<pre><code>import Queue
import threading
import time
import requests
exit_flag = 0
class RequestThread(threading.Thread):
def __init__(self, thread_id, name, q):
threading.Thread.__init__(self)
self.thread_id = thread_id
self.name = name
self.q = q
def run(self):
print("Starting {0:s}".format(self.name))
process_data(self.name, self.q)
print("Exiting {0:s}".format(self.name))
def process_data(thread_name, q):
while not exit_flag:
queue_lock.acquire()
if not qork_queue.empty():
data = q.get()
queue_lock.release()
print("{0:s} processing {1:s}".format(thread_name, data))
response = requests.get(data)
print(response)
else:
queue_lock.release()
time.sleep(1)
thread_list = ["Thread-1", "Thread-2", "Thread-3"]
request_list = [
"https://api.github.com/events",
"http://api.plos.org/search?q=title:THREAD",
"http://api.plos.org/search?q=title:DNA",
"http://api.plos.org/search?q=title:PYTHON",
"http://api.plos.org/search?q=title:JAVA"
]
queue_lock = threading.Lock()
qork_queue = Queue.Queue(10)
threads = []
thread_id = 1
# Create new threads
for t_name in thread_list:
thread = RequestThread(thread_id, t_name, qork_queue)
thread.start()
threads.append(thread)
thread_id += 1
# Fill the queue
queue_lock.acquire()
for word in request_list:
qork_queue.put(word)
queue_lock.release()
# Wait for queue to empty
while not qork_queue.empty():
pass
# Notify threads it's time to exit
exit_flag = 1
# Wait for all threads to complete
for t in threads:
t.join()
print("Exiting Main Thread")
</code></pre>
<p>输出:</p>
^{pr2}$
<p>尽管我不是多线程专家,但还是有一点解释:</p>
<p><strong>1.排队</strong></p>
<p><a href="https://docs.python.org/2/library/queue.html" rel="nofollow noreferrer">Queue</a>模块允许您创建一个新的队列对象,该对象可以保存特定数量的项目。有以下方法可以控制队列:</p>
<ul>
<li><strong>get()</strong>−从队列中删除并返回项目。在</li>
<li><strong>put()</strong>−将项目添加到队列。
qsize()−返回当前在队列中的项目数。在</li>
<li><strong>empty()</strong>−如果队列为空,则返回True;否则返回False。在</li>
<li><strong>full()</strong>−如果队列已满,则返回True;否则返回False。在</li>
</ul>
<p>根据我对多线程处理的一点经验,这对于控制仍要处理的数据非常有用。我有这样的情况,线程在做同样的事情,或者除了一个线程都退出了。这有助于我控制要处理的共享数据。在</p>
<p><strong>2.锁定</strong></p>
<p>Python提供的线程模块包含一个易于实现的<a href="https://docs.python.org/2/library/threading.html#lock-objects" rel="nofollow noreferrer">locking mechanism</a>,它允许您同步线程。通过调用<code>Lock()</code>方法创建一个新锁,该方法返回新锁。在</p>
<blockquote>
<p>A primitive lock is in one of two states, “locked” or “unlocked”. It
is created in the unlocked state. It has two basic methods, acquire()
and release(). When the state is unlocked, acquire() changes the state
to locked and returns immediately. When the state is locked, acquire()
blocks until a call to release() in another thread changes it to
unlocked, then the acquire() call resets it to locked and returns. The
release() method should only be called in the locked state; it changes
the state to unlocked and returns immediately. If an attempt is made
to release an unlocked lock, a ThreadError will be raised.</p>
</blockquote>
<p>对于更多的人类语言锁是线程模块提供的最基本的同步机制。在任何时候,锁可以由单个线程持有,也可以完全不由线程持有。如果一个线程试图持有另一个线程已经持有的锁,那么第一个线程的执行将被暂停,直到该锁被释放。在</p>
<p>锁通常用于同步对共享资源的访问。对于每个共享资源,创建一个锁对象。当您需要访问资源时,调用acquire来保持锁(如果需要,这将等待锁释放),然后调用release来释放它。在</p>
<p><strong>3.线程</strong></p>
<p>要使用线程模块实现新线程,必须执行以下操作:</p>
<ul>
<li>定义Thread类的新子类。在</li>
<li>重写init</strong>(self[,args])方法以添加其他参数。在</li>
<li>然后,重写run(self[,args])方法来实现线程在启动时应该执行的操作。在</li>
</ul>
<p>一旦创建了新的Thread子类,就可以创建它的一个实例,然后通过调用start()来启动一个新线程,后者又调用run()方法。方法:</p>
<ul>
<li><strong>run()</strong>–方法是线程的入口点。在</li>
<li><strong>start()</strong>–方法通过调用run方法来启动线程。在</li>
<li><strong>join([time])</strong>−等待线程终止。在</li>
<li><strong>isAlive()</strong>–方法检查线程是否仍在执行。在</li>
<li><strong>getName()</strong>−返回线程的名称。在</li>
<li><strong>setName()</strong>−设置线程的名称。在</li>
</ul>
<h2>它真的更快吗?</strong></h2>
<p>使用单线程:</p>
<pre><code>$ time python single.py
Processing request url: https://api.github.com/events
<Response [200]>
Processing request url: http://api.plos.org/search?q=title:THREAD
<Response [200]>
Processing request url: http://api.plos.org/search?q=title:DNA
<Response [200]>
Processing request url: http://api.plos.org/search?q=title:PYTHON
<Response [200]>
Processing request url: http://api.plos.org/search?q=title:JAVA
<Response [200]>
Exiting Main Thread
real 0m22.310s
user 0m0.096s
sys 0m0.022s
</code></pre>
<p>使用3个螺纹:</p>
<pre><code>Starting Thread-1
Starting Thread-2
Starting Thread-3
Thread-3 processing https://api.github.com/events
Thread-1 processing http://api.plos.org/search?q=title:THREAD
Thread-2 processing http://api.plos.org/search?q=title:DNA
<Response [200]>
<Response [200]>
<Response [200]>
Thread-1 processing http://api.plos.org/search?q=title:PYTHON
Thread-2 processing http://api.plos.org/search?q=title:JAVA
Exiting Thread-3
<Response [200]>
<Response [200]>
Exiting Thread-1
Exiting Thread-2
Exiting Main Thread
real 0m11.726s
user 0m6.692s
sys 0m0.028s
</code></pre>
<p>使用5个螺纹:</p>
<pre><code>time python multi.py
Starting Thread-1
Starting Thread-2
Starting Thread-3
Starting Thread-4
Starting Thread-5
Thread-5 processing https://api.github.com/events
Thread-1 processing http://api.plos.org/search?q=title:THREAD
Thread-2 processing http://api.plos.org/search?q=title:DNA
Thread-3 processing http://api.plos.org/search?q=title:PYTHONThread-4 processing http://api.plos.org/search?q=title:JAVA
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
Exiting Thread-5
Exiting Thread-4
Exiting Thread-2
Exiting Thread-3
Exiting Thread-1
Exiting Main Thread
real 0m6.446s
user 0m1.104s
sys 0m0.029s
</code></pre>
<p>5个线程几乎快4倍。这些只是5个虚拟请求。想象一下更大的数据块。在</p>
<p>请注意:我只在Python2.7下针对Python3.x进行了测试,可能需要进行一些小的调整。在</p>