擅长:python、mysql、java
<p>关于你最近的答复,我是否可以建议你不要太担心碰撞,采取这样的方法:</p>
<pre><code>import random
import string
string_set = set()
# Just digits at chars is 36 possibilities
chars = string.digits + string.ascii_uppercase
# For an 8 character string note that there are 36^8 possibilities
N = 8
gen_str = lambda: ''.join(random.choices(chars, k=N))
# This is about 3000,000,000,000 possibilities
# Let's create some strings
for i in range(10000000):
chars = gen_str()
# Python set implemented as hash table so this is O(1). (It doesn't slow down as set grows)
while chars in string_set:
#Regenerate random number if collision
#(This is has a n / (36^6) likelihood of happening where "n" is the number of elements in the set so far). Until you exceed 1 billion elements this is a non-issue
chars = gen_str()
print("adding to set: ", chars)
string_set.add(chars)
</code></pre>
<p>这种方法的主要缺点是需要将集合存储在某个地方(pickle并取消pickle),这可能会使数据库中存储的每8个字符的内存使用量增加一倍。如果您的集合对于系统内存来说太大,那么这也将变得不可能。在这种情况下,您可以查看“shelve”模块,该模块提供一个类似dict的数据库对象,您可以通过忽略存储到每个键的值,以与集合基本相同的方式使用该对象</p>