在python中使用多线程时如何获得更快的速度

import threading import time import urllib import urllib2 class Post: def __init__(self, website, data, mode): self.website = website self.data = data #mode is either "Simple"(Simple POST) or "Multiple"(Multi-thread POST) self.mode = mode def post(self): #post data req = urllib2.Request(self.website) open_url = urllib2.urlopen(req, self.data) if self.mode == "Multiple": time.sleep(0.001) #read HTMLData HTMLData = open_url.read() print "OK" if __name__ == "__main__": current_post = Post("http://forum.xda-developers.com/login.php", "vb_login_username=test&vb_login_password&securitytoken=guest&do=login", \ "Simple") #save the time before post data origin_time = time.time() if(current_post.mode == "Multiple"): #multithreading POST for i in range(0, 10): thread = threading.Thread(target = current_post.post) thread.start() thread.join() #calculate the time interval time_interval = time.time() - origin_time print time_interval if(current_post.mode == "Simple"): #simple POST for i in range(0, 10): current_post.post() #calculate the time interval time_interval = time.time() - origin_time print time_interval

3条回答

网友

1楼 · 编辑于 2024-05-04 10:25:23

在许多情况下，python的线程并不能很好地提高执行速度。。。有时候，这会让事情变得更糟。有关详细信息，请参见David Beazley's PyCon2010 presentation on the Global Interpreter Lock/Pycon2010 GIL slides。这个演示内容非常丰富，我强烈推荐给任何考虑线程的人。。。

尽管David Beazley的演讲解释了网络流量改善了Python线程模块的调度，但是您应该使用multiprocessing module。我在你的代码中加入了这个选项（见我的答案的底部）。

在我的一台旧机器上运行（Python2.6.6）：

current_post.mode == "Process"  (multiprocessing)  --> 0.2609 seconds
current_post.mode == "Multiple" (threading)        --> 0.3947 seconds
current_post.mode == "Simple"   (serial execution) --> 1.650 seconds

我同意TokenMacGuy的评论，上面的数字包括将.join()移动到不同的循环。如您所见，python的多处理速度明显快于线程。

from multiprocessing import Process
import threading
import time
import urllib
import urllib2


class Post:

    def __init__(self, website, data, mode):
        self.website = website
        self.data = data

        #mode is either:
        #   "Simple"      (Simple POST)
        #   "Multiple"    (Multi-thread POST)
        #   "Process"     (Multiprocessing)
        self.mode = mode
        self.run_job()

    def post(self):

        #post data
        req = urllib2.Request(self.website)
        open_url = urllib2.urlopen(req, self.data)

        if self.mode == "Multiple":
            time.sleep(0.001)

        #read HTMLData
        HTMLData = open_url.read()

        #print "OK"

    def run_job(self):
        """This was refactored from the OP's code"""
        origin_time = time.time()
        if(self.mode == "Multiple"):

            #multithreading POST
            threads = list()
            for i in range(0, 10):
               thread = threading.Thread(target = self.post)
               thread.start()
               threads.append(thread)
            for thread in threads:
               thread.join()
            #calculate the time interval
            time_interval = time.time() - origin_time
            print "mode - {0}: {1}".format(method, time_interval)

        if(self.mode == "Process"):

            #multiprocessing POST
            processes = list()
            for i in range(0, 10):
               process = Process(target=self.post)
               process.start()
               processes.append(process)
            for process in processes:
               process.join()
            #calculate the time interval
            time_interval = time.time() - origin_time
            print "mode - {0}: {1}".format(method, time_interval)

        if(self.mode == "Simple"):

            #simple POST
            for i in range(0, 10):
                self.post()
            #calculate the time interval
            time_interval = time.time() - origin_time
            print "mode - {0}: {1}".format(method, time_interval)
        return time_interval

if __name__ == "__main__":

    for method in ["Process", "Multiple", "Simple"]:
        Post("http://forum.xda-developers.com/login.php", 
            "vb_login_username=test&vb_login_password&securitytoken=guest&do=login",
            method
            )

网友

2楼 · 编辑于 2024-05-04 10:25:23

你做错的最大一件事，也就是最伤你的吞吐量，就是你调用thread.start()和thread.join()的方式：

for i in range(0, 10):
   thread = threading.Thread(target = current_post.post)
   thread.start()
   thread.join()

每次通过循环，您都会创建一个线程，启动它，然后等待它完成，然后再转到下一个线程。你一点也不做！

你应该做的是：

threads = []

# start all of the threads
for i in range(0, 10):
   thread = threading.Thread(target = current_post.post)
   thread.start()
   threads.append(thread)

# now wait for them all to finish
for thread in threads:
   thread.join()

网友
3楼 · 编辑于 2024-05-04 10:25:23

请记住，在Python中，多线程可以“提高速度”的唯一情况是当您有像这样的操作时会受到严重的I/O限制。否则多线程不会提高“速度”，因为它不能在多个CPU上运行（不，即使您有多个内核，python也不会这样工作）。当您希望两件事同时完成时，应该使用多线程，而不是当您希望两件事并行时（即两个进程分别运行）。

现在，你实际上所做的并不会增加任何一个DNS查找的速度，但是它允许在等待其他一些结果的同时发送多个请求，但是你应该小心你做了多少，否则你只会使响应时间比现在更糟。

另外，请停止使用urllib2，并使用请求：http://docs.python-requests.org

相关问题更多 >

编程相关推荐

热门问题

热门文章