多线程/优化Python请求?

2024-10-06 07:54:26 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试优化这段代码,目前它在10分钟内运行340个请求。我试图在30分钟内收到1800个请求。因为我可以根据amazonapi每秒运行一个请求。我可以用多线程处理来增加运行次数吗??在

但是,我正在读取主函数的完整数据,现在是否应该拆分它,如何计算每个线程应该占用多少?在

def newhmac():
    return hmac.new(AWS_SECRET_ACCESS_KEY, digestmod=sha256)

def getSignedUrl(params):
    hmac = newhmac()
    action = 'GET'
    server = "webservices.amazon.com"
    path = "/onca/xml"

    params['Version'] = '2013-08-01'
    params['AWSAccessKeyId'] = AWS_ACCESS_KEY_ID
    params['Service'] = 'AWSECommerceService'
    params['Timestamp'] = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())

    key_values = [(urllib.quote(k), urllib.quote(v)) for k,v in params.items()]
    key_values.sort()
    paramstring = '&'.join(['%s=%s' % (k, v) for k, v in key_values])
    urlstring = "http://" + server + path + "?" + \
        ('&'.join(['%s=%s' % (k, v) for k, v in key_values]))
    hmac.update(action + "\n" + server + "\n" + path + "\n" + paramstring)
    urlstring = urlstring + "&Signature="+\
        urllib.quote(base64.encodestring(hmac.digest()).strip())
    return urlstring

def readData():
    data = []
    with open("ASIN.csv") as f:
        reader = csv.reader(f)
        for row in reader:
            data.append(row[0])
    return data

def writeData(data):
    with open("data.csv", "a") as f:
        writer = csv.writer(f)
        writer.writerows(data)

def main():
    data = readData()
    filtData = []
    i = 0
    count = 0
    while(i < len(data) -10 ):
        if (count %4 == 0):
            time.sleep(1)
        asins = ','.join([data[x] for x in range(i,i+10)])
        params = {'ResponseGroup':'OfferFull,Offers',
                 'AssociateTag':'4chin-20',
                 'Operation':'ItemLookup',
                 'IdType':'ASIN',
                 'ItemId':asins}
        url = getSignedUrl(params)
        resp = requests.get(url)
        responseSoup=BeautifulSoup(resp.text)

        quantity = ['' if product.amount is None else product.amount.text for product in responseSoup.findAll("offersummary")]
        price = ['' if product.lowestnewprice is None else product.lowestnewprice.formattedprice.text for product in responseSoup.findAll("offersummary")]
        prime = ['' if product.iseligibleforprime is None else product.iseligibleforprime.text for product in responseSoup("offer")]


        for zz in zip(asins.split(","), price,quantity,prime):
            print zz
            filtData.append(zz)

        print i, len(filtData)
        i+=10
        count +=1
    writeData(filtData)


threading.Timer(1.0, main).start()

Tags: csvkeytextinfordataifdef
1条回答
网友
1楼 · 发布于 2024-10-06 07:54:26

如果您使用的是python3.2,那么可以使用concurrent.futures库来轻松地在多个线程中启动任务。e、 在这里,我模拟并行运行10个url解析作业,每一个需要1秒,如果同步运行,它将花费10秒,但是线程池为10时应该需要大约1秒

import time
from concurrent.futures import ThreadPoolExecutor

def parse_url(url):
    time.sleep(1)
    print(url)
    return "done."

st = time.time()
with ThreadPoolExecutor(max_workers=10) as executor:
    for i in range(10):
        future = executor.submit(parse_url, "http://google.com/%s"%i)

print("total time: %s"%(time.time() - st))

输出:

^{pr2}$

相关问题 更多 >