hash函数为什么要选择对素数求余

in #cn7 years ago

常用的hash函数是选一个数m取模(余数),这个数在课本中推荐m是素数,

但是经常见到选择m=2^n,因为对2^n求余数更快,并认为在key分布均匀的情况下,

key%m也是在[0,m-1]区间均匀分布的。但实际上,key%m的分布同m是有关的。

证明如下:

key%m = key - xm,即key减掉m的某个倍数x,剩下比m小的部分就是key除以m的余数。

显然,x等于key/m的整数部分,以floor(key/m)表示。

假设key和m有公约数g,即key=ag, m=bg,

则 key - xm = key - floor(key/m)m = key - floor(a/b)m。

由于0 <= a/b <= a,所以floor(a/b)只有a+1中取值可能,

从而推导出key%m也只有a+1中取值可能。a+1个球放在m个盒子里面,显然不可能做到均匀。


由此可知,一组均匀分布的key,其中同m公约数为1的那部分,

余数后在[0,m-1]上还是均匀分布的,但同m公约数不为1的那部分,

余数在[0, m-1]上就不是均匀分布的了。把m选为素数,

正是为了让所有key同m的公约数都为1,从而保证余数的均匀分布,降低冲突率。

Sort:  

Congratulations @baiyunping333! You received a personal award!

1 Year on Steemit

Click here to view your Board

Do not miss the last post from @steemitboard:

SteemWhales has officially moved to SteemitBoard Ranking
SteemitBoard - Witness Update

Support SteemitBoard's project! Vote for its witness and get one more award!

Congratulations @baiyunping333! You received a personal award!

Happy Birthday! - You are on the Steem blockchain for 2 years!

You can view your badges on your Steem Board and compare to others on the Steem Ranking

Vote for @Steemitboard as a witness to get one more award and increased upvotes!