随机分词函数Python的优化错误

def random_multisplitter(word): from numpy import mod spw = [] length = len(word) rand = random_int(word) if rand == length: #probability of not splitting return [word] else: div = mod(rand, (length + 1)) #defining division points bound = length - div spw.append(div) while div != 0: rand = random_int(word) div = mod(rand,(bound+1)) bound = bound-div spw.append(div) result = spw b = 0 points =[] for x in range(len(result)-1): #calculating splitting points b=b+result[x] points.append(b) xy=0 t=[] for i in points: t.append(word[xy:i]) xy=i if word[xy:len(word)]!='': t.append(word[xy:len(word)]) if type(t)!=list: return [t] return t

1条回答

网友

1楼 · 发布于 2024-05-19 05:21:49

我不知道你在做什么，但结果肯定不是所有你的代码一样可能。因此，代码不起作用，实际上StackOverflow可能是正确的位置，即使您不知道它。
我怎么知道你的代码不起作用？那个Law of Large Numbers！它看起来可疑，所以我用你的函数生成了一百万个样本，得到了这个分布：

注意，y轴的标度是对数的，那些估计的概率变化很大！你知道吗

所以现在有些代码速度更快，而且实际产生的结果也同样可能：

def random_multisplitter(word):
    # add's bits will tell whether a char shall be added to last substring or
    # be the beginning of its own substring
    add = random.randint(0, 2**len(word) - 1)

    # append 0 to make sure first char is start of first substring
    add <<= 1

    res = []
    for char in word:
        # see if last bit is 1
        if add & 1:
            res[-1] += char
        else:
            res.append(char)
        # shift to next bit
        add >>= 1

    return res

这就是布尔克恩赫特的建议，信不信由你，在他们发表评论前一个小时，我也有同样的想法，但我没时间写这个答案。
不管怎样，下面是该函数的估计概率：

全部聚集在1/64=0.015625（绿线）附近，表明概率分布是均匀的。你知道吗

在我使用python2.7的机器上，此函数的计时为4.56µs，而您的函数的计时为20.1µs。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章