下面是我编写的实现^{Jenkins hash function
来生成32 bit hash value
的数据。程序似乎遵循了算法,但偏离了大约20%。我的数据集由200000多个唯一记录组成,而程序输出约160000个唯一记录。请帮助我理解我所犯的错误。哈希函数按Bob Jerkins' website实现。在
import numpy as np
from jenkinshash import jhash
class PCSA():
def __init__(self, nmap, maxlength):
self.nmap = nmap
self.maxlength = maxlength
self.bitmap = np.zeros((nmap, maxlength), dtype=np.int)
def count(self, data):
hashedValue = jhash(data)
indexAlpha = hashedValue % self.nmap
ix = hashedValue / self.nmap
ix = bin(ix)[2:][::-1]
indexBeta = ix.find("1") #find index of lsb
if self.bitmap[indexAlpha, indexBeta] == 0:
self.bitmap[indexAlpha, indexBeta] = 1
def getCardinality(self):
sumIx = 0
for row in range(self.nmap):
sumIx += np.where(self.bitmap[row, :] == 0)[0][0]
A = sumIx / self.nmap
cardinality = self.nmap * (2 ** A)/ MAGIC_CONST
return cardinality
如果在Python2中运行这个函数,那么计算A的除法可能会导致A变为整数。在
如果是这样,您可以尝试更改:
到
^{pr2}$相关问题 更多 >
编程相关推荐