<p>您可以尝试下面的方法,只要服务器也能处理,它将很容易让您并行地发出大量请求</p>
<pre><code># it's just a wrapper around concurrent.futures ThreadPoolExecutor with a nice tqdm progress bar!
from tqdm.contrib.concurrent import thread_map, process_map # for multi-threading, multi-processing respectively)
def chunk_list(lst, size):
"""
From SO only;
Yield successive n-sized chunks from list.
"""
for i in range(0, len(lst), size):
yield lst[i:i + size]
for idx, my_chunk in enumerate(chunk_list(huge_list, size=2**12)):
for response in thread_map(<which_func_to_call>, my_chunk, max_workers=your_cpu_cores+6)):
# which_func_to_call -> wrap the returned response json obj in this, etc
# do something with the response now..
# make sure to cache the chunk results as well
</code></pre>
<p>编辑1:</p>
<pre><code>from functools import partial
startdate = "*-150d"
enddate = '*'
my_new_func = partial(which_func_to_call, startdate=startdate, enddate=enddate)
</code></pre>
<p>现在我们可以用这个函数来代替;
注意-><code>my_new_func</code>现在接受一个参数</p>
<p>编辑2:</p>
<p>对于缓存,我建议使用<code>csv</code>模块,将您想要的响应写入csv文件,而不是使用pandas等;或者您可以根据需要转储JSON响应等;类似JSON/dict响应的示例代码如下所示:</p>
<pre><code>import csv
import os
with open(OUTPUT_FILE_NAME, "a+", newline="") as csvfile:
# fieldnames = [your_headers_list]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
# Make sure you write the header only once as we are opening the file in append mode (writer.writeheader())
for idx, my_chunk in enumerate(chunk_list(<huge_list>, size=CHUNK_SIZE)):
for response in thread_map(
<my_partial_wrapped_func>, my_chunk, max_workers=min(32, os.cpu_count() + 6)
):
# .......
# .......
writer.writerow(<row_of_the_csv_as_a_dict_with_fieldnames_as_keys>)
</code></pre>