<p>这就是我最后所做的。我相信这个解决方案比<a href="https://stackoverflow.com/a/65240594/1014237">https://stackoverflow.com/a/65240594/1014237</a>好,因为它避免了使用<code>random.shuffle</code>,所以它从不存储太多的元素(即,它只存储多达<code>batch_length</code>个随机索引,而不是多达<code>max(gen_lens)</code>。生成随机索引的工作只在需要时进行</p>
<pre><code>def get_random_element(data, data_length):
pos = data_length
while pos > 0:
idx = random.randrange(start=0, stop=pos)
pos -= 1
if idx != pos:
data[pos], data[idx] = data[idx], data[pos]
yield data[pos]
def get_random_idx_generator(n):
# Create a generator of random indexes, n long
return get_random_element(list(range(n)), n)
</code></pre>
<p>我使用<code>itertools.islice</code>从这个生成器中消耗数据,这样我就只存储给定时刻所需的任意数量的随机索引。该函数还使用索引和数据列表的长度来确定需要从中读取的数据</p>
<pre><code># Yield a batch_size long list of random IPs, using the random idx generator
def get_randomized_ips_batch(ipnetworks_list, ipnetworks_list_lens,
random_idx_generator, batch_size=1024,
as_int=False) -> Iterator[Union[ipaddress.IPv4Address, int]]:
random_indexes_batch = list(itertools.islice(random_idx_generator, batch_size))
# Figure out which ipnetwork_list our index is pointing to and yield it
for idx in random_indexes_batch:
cumulative_len = 0
gen_idx = 0
for ipnetwork_len in ipnetworks_list_lens:
if idx - cumulative_len >= ipnetwork_len:
cumulative_len += ipnetwork_len
gen_idx += 1
continue
else:
addr = ipnetworks_list[gen_idx][idx - cumulative_len - 1]
yield int(addr) if as_int else addr
break
</code></pre>