11. 浼樼鐨勫熀鏁扮粺璁$畻娉?-HyperLogLog
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了11. 浼樼鐨勫熀鏁扮粺璁$畻娉?-HyperLogLog相关的知识,希望对你有一定的参考价值。
鏍囩锛?a href='http://www.mamicode.com/so/1/res' title='res'>res
妤斿瓙
鍦ㄦ垜浠疄闄呭紑鍙戠殑杩囩▼涓紝鍙兘浼氶亣鍒拌繖鏍蜂竴涓棶棰橈紝褰撴垜浠渶瑕佺粺璁′竴涓ぇ鍨嬬綉绔欑殑鐙珛璁块棶娆℃暟鏃讹紝璇ョ敤浠€涔堢殑绫诲瀷鏉ョ粺璁★紵
濡傛灉鎴戜滑浣跨敤 Redis 涓殑闆嗗悎鏉ョ粺璁★紝褰撳畠姣忓ぉ鏈夋暟鍗冧竾绾у埆鐨勮闂椂锛屽皢浼氭槸涓€涓法澶х殑闂銆傚洜涓鸿繖浜涜闂噺涓嶈兘琚竻绌猴紝鎴戜滑杩愯惀浜哄憳鍙兘浼氶殢鏃舵煡鐪嬭繖浜涗俊鎭紝閭d箞闅忕潃鏃堕棿鐨勬帹绉伙紝杩欎簺缁熻鏁版嵁鎵€鍗犵敤鐨勭┖闂翠細瓒婃潵瓒婂ぇ锛岄€愭笎瓒呭嚭鎴戜滑鑳芥壙杞芥渶澶х┖闂淬€?/strong>
渚嬪锛屾垜浠敤 IP 鏉ヤ綔涓虹嫭绔嬭闂殑鍒ゆ柇渚濇嵁锛岄偅涔堟垜浠氨瑕佹妸姣忎釜鐙珛 IP 杩涜瀛樺偍锛屼互 IP4 鏉ヨ绠楋紝IP4 鏈€澶氶渶瑕?15 涓瓧鑺傛潵瀛樺偍淇℃伅锛屼緥濡傦細110.110.110.110銆傚綋鏈変竴鍗冧竾涓嫭绔?IP 鏃讹紝鎵€鍗犵敤鐨勭┖闂村氨鏄?15 bit *?10000000 绾﹀畾浜?143MB锛屼絾杩欏彧鏄竴涓〉闈㈢殑缁熻淇℃伅锛屽亣濡傛垜浠湁 1 涓囦釜杩欐牱鐨勯〉闈紝閭f垜浠氨闇€瑕?1T 浠ヤ笂鐨勭┖闂存潵瀛樺偍杩欎簺鏁版嵁銆傝€屼笖闅忕潃 IP6 鐨勬櫘鍙婏紝杩欎釜瀛樺偍鏁板瓧浼氳秺鏉ヨ秺澶э紝閭f垜浠氨涓嶈兘鐢ㄩ泦鍚堢殑鏂瑰紡鏉ュ瓨鍌ㄤ簡锛岃繖涓椂鍊欐垜浠渶瑕佸紑鍙戞柊鐨勬暟鎹被鍨嬫潵鍋氳繖浠朵簨浜嗭紝鑰岃繖涓柊鐨勬暟鎹被鍨嬪氨鏄垜浠粖澶╄浠嬬粛鐨凥yperLogLog銆?/strong>
HyperLogLog锛堜笅鏂囩畝绉颁负 HLL锛夋槸 Redis 2.8.9 鐗堟湰娣诲姞鐨勬暟鎹粨鏋勶紝瀹冪敤浜庨珮鎬ц兘鐨勫熀鏁帮紙鍘婚噸锛夌粺璁″姛鑳斤紝瀹冪殑缂虹偣灏辨槸瀛樺湪鏋佷綆鐨勮宸巼
HLL 鍏锋湁浠ヤ笅鍑犱釜鐗圭偣锛?/strong>
鑳藉浣跨敤鏋佸皯鐨勫唴瀛樻潵缁熻宸ㄩ噺鐨勬暟鎹紝瀹冨彧闇€瑕?12K 绌洪棿灏辫兘缁熻 2^64 鐨勬暟鎹紱
缁熻瀛樺湪涓€瀹氱殑璇樊锛岃宸巼鏁翠綋杈冧綆锛屾爣鍑嗚宸负 0.81%锛?/code>
璇樊鍙互琚缃緟鍔╄绠楀洜瀛愯繘琛岄檷浣庛€?/code>
HLL 鐨勫懡浠ゅ彧鏈?3 涓紝浣嗛兘闈炲父鐨勫疄鐢紝涓嬮潰鍒嗗埆鏉ョ湅銆?/strong>
娣诲姞鍏冪礌
pfadd key element1 element2路路路路路路锛屽彲浠ュ悓鏃舵坊鍔犲涓€?/strong>
127.0.0.1:6379> pfadd hll1 mea
(integer) 1
127.0.0.1:6379> pfadd hll1 kano nana
(integer) 1
127.0.0.1:6379> pfadd hll1 mea
(integer) 0
127.0.0.1:6379>
缁熻涓嶉噸澶嶇殑鍏冪礌涓暟
pfcount key1 key2路路路路锛屽彲浠ュ悓鏃剁粺璁″涓狧HL缁撴瀯銆?/strong>
127.0.0.1:6379> pfcount hll1
(integer) 3 # 涓嶉噸澶嶅厓绱犱釜鏁版湁3涓?127.0.0.1:6379>
灏嗗涓狧LL缁撴瀯涓厓绱犵Щ鍔ㄥ埌鏂扮殑HLL缁撴瀯涓?/b>
pfmerge key key1 key2路路路路锛屽皢key1銆乲ey2路路路路绉诲姩鍒発ey涓€?/strong>
127.0.0.1:6379> pfadd hll1 mea kano nana
(integer) 1
127.0.0.1:6379> pfadd hll2 mea kano yume
(integer) 1
127.0.0.1:6379> pfmerge hll hll1 hll2
OK
127.0.0.1:6379> pfcount hll
(integer) 4
127.0.0.1:6379>
褰撴垜浠渶瑕佸悎骞朵袱涓垨澶氫釜鍚岀被椤甸潰鐨勮闂暟鎹椂锛屾垜浠彲浠ヤ娇鐢?pfmerge 鏉ユ搷浣溿€?/strong>
Python瀹炵幇HLL鐩稿叧鎿嶄綔
import redis
client = redis.Redis(host="47.94.174.89", decode_responses="utf-8")
# 1. pfadd key1 key2路路路
client.pfadd("HLL1", "a", "b", "c")
client.pfadd("HLL2", "b", "c", "d")
# 2. pfcount key1 key2路路路
print(client.pfcount("HLL1", "HLL2")) # 4
# 3. pfmerge key key1 key2路路路
client.pfmerge("HLL", "HLL1", "HLL2")
print(client.pfcount("HLL")) # 4
HyperLogLog绠楁硶鍘熺悊
HyperLogLog 绠楁硶鏉ユ簮浜庤鏂?code>HyperLogLog the analysis of a near-optimal cardinality estimation algorithm锛屾兂瑕佷簡瑙?HLL 鐨勫師鐞嗭紝鍏堣浠庝集鍔埄璇曢獙璇磋捣锛屼集鍔埄瀹為獙鎸囩殑鏄湪鍚屾牱鐨勬潯浠朵笅閲嶅鍦般€佺浉浜掔嫭绔嬪湴杩涜鐨勪竴绉嶉殢鏈鸿瘯楠岋紝鍏剁壒鐐规槸璇ラ殢鏈鸿瘯楠屽彧鏈変袱绉嶅彲鑳界粨鏋滐細鍙戠敓鎴栬€呬笉鍙戠敓銆傛垜浠亣璁捐椤硅瘯楠岀嫭绔嬮噸澶嶅湴杩涜浜唍娆★紝閭d箞灏辩О杩欎竴绯诲垪閲嶅鐙珛鐨勯殢鏈鸿瘯楠屼负n閲嶄集鍔埄璇曢獙锛屾垨绉颁负浼姫鍒╂鍨嬨€傛瘮濡傛渶缁忓吀銆佷篃鏄渶濂界悊瑙g殑鎶涚‖甯侊紝姣忎竴娆℃姏鍑虹殑纭竵閮芥槸鍚勮嚜鐙珛鐨勶紝褰撳墠鎶涘嚭鐨勭‖甯佷笉鍙椾笂涓€娆$殑褰卞搷銆?/strong>
娉ㄦ剰锛氬崟涓集鍔埄璇曢獙鏄病鏈夊澶ф剰涔夌殑锛岀劧鑰岋紝褰撴垜浠弽澶嶈繘琛屼集鍔埄璇曢獙锛屽幓瑙傚療杩欎簺璇曢獙鏈夊灏戞槸鎴愬姛鐨勶紝澶氬皯鏄け璐ョ殑锛屼簨鎯呭氨鍙樺緱鏈夋剰涔変簡锛岃繖浜涚疮璁¤褰曞寘鍚簡寰堝娼滃湪鐨勯潪甯告湁鐢ㄧ殑淇℃伅銆?/strong>
骞朵笖鏍规嵁澶ф暟瀹氱悊鎴戜滑鐭ラ亾锛屽鏋滀竴涓簨浠跺彂鐢熺殑姒傜巼鏄亽瀹氱殑锛岄偅涔堥殢鐫€璇曢獙娆℃暟鐨勫鍔狅紝閭d箞璇ヤ簨浠剁殑棰戠巼瓒婃帴杩戞鐜囥€傝繕鎷挎姏纭竵涓句緥锛屽亣璁句綘鎶涚‖甯佹姏浜嗗洓娆★紝鍏ㄦ槸姝i潰(杩欑鎯呭喌鏄彲鑳藉嚭鐜扮殑)
锛岄毦閬撴垜浠氨璇存姏鍑轰竴鏋氱‖甯侊紝姝i潰鏈濅笂鐨勬鐜囨槸鐧惧垎涔嬬櫨鍚楋紵鏄剧劧涓嶈兘锛岃€屽ぇ鏁板畾鐞嗕細鍛婅瘔鎴戜滑锛屽彧瑕佷綘鎶涘嚭纭竵鐨勬鏁拌冻澶熷锛屼綘浼氬彂鐜版闈㈠嚭鐜扮殑娆℃暟闄や互鎶涘嚭鐨勬€绘鏁颁細鏃犻檺鎺ヨ繎浜屽垎涔嬩竴銆?/strong>
涔嬫墍浠ヨ杩欎簺锛屾槸鍥犱负Redis閲囩敤鐨勭畻娉曚笉鏄寜鐓х被浼兼垜浠笂闈㈣鐨勬柟寮忥紝鍥犱负澶ф暟瀹氱悊瀵逛簬鏁版嵁閲忓皬鐨勬椂鍊欙紝浼氭湁寰堝ぇ鐨勮宸€傝€屼负浜嗚В鍐宠繖涓棶棰橈紝HLL 寮曞叆浜嗗垎妗剁畻娉曞拰璋冨拰骞冲潎鏁版潵浣胯繖涓畻娉曟洿鎺ヨ繎鐪熷疄鎯呭喌銆?/strong>
鍒嗘《绠楁硶鏄寚鎶婂師鏉ョ殑鏁版嵁骞冲潎鍒嗕负 m 浠斤紝鍦ㄦ瘡娈典腑姹傚钩鍧囨暟鍦ㄤ箻浠?m锛屼互姝ゆ潵娑堝噺鍥犲伓鐒舵€у甫鏉ョ殑璇樊锛屾彁楂橀浼扮殑鍑嗙‘鎬э紝绠€鍗曟潵璇村氨鏄妸涓€浠芥暟鎹垎涓哄浠斤紝鎶婁竴杞绠楋紝鍒嗕负澶氳疆璁$畻銆?/strong>
鑰岃皟鍜屽钩鍧囨暟鎸囩殑鏄娇鐢ㄥ钩鍧囨暟鐨勪紭鍖栫畻娉曪紝鑰岄潪鐩存帴浣跨敤骞冲潎鏁般€?/strong>
渚嬪灏忔槑鐨勬湀宸ヨ祫鏄?1000 鍏冿紝鑰屽皬鐜嬬殑鏈堝伐璧勬槸 100000 鍏冿紝濡傛灉鐩存帴鍙栧钩鍧囨暟锛岄偅灏忔槑鐨勫钩鍧囧伐璧勫氨鍙樻垚浜?(1000+100000)/2=50500? 鍏冿紝杩欐樉鐒舵槸涓嶅噯纭殑锛岃€屼娇鐢ㄨ皟鍜屽钩鍧囨暟绠楁硶璁$畻鐨勭粨鏋滄槸 2/(1/1000+1/100000)鈮?998 鍏冿紝鏄剧劧姝ょ畻娉曟洿绗﹀悎瀹為檯骞冲潎鏁般€?/strong>
鎵€浠ョ患鍚堜互涓婃儏鍐碉紝鍦?Redis 涓娇鐢?HLL 鎻掑叆鏁版嵁锛岀浉褰撲簬鎶婂瓨鍌ㄧ殑鍊肩粡杩?hash 涔嬪悗锛屽啀灏?hash 鍊艰浆鎹负浜岃繘鍒讹紝瀛樺叆鍒颁笉鍚岀殑妗朵腑锛岃繖鏍峰氨鍙互鐢ㄥ緢灏忕殑绌洪棿瀛樺偍寰堝鐨勬暟鎹紝缁熻鏃跺啀鍘荤浉搴旂殑浣嶇疆杩涜瀵规瘮寰堝揩灏辫兘寰楀嚭缁撹锛岃繖灏辨槸 HLL 绠楁硶鐨勫熀鏈師鐞嗭紝鎯宠鏇存繁鍏ョ殑浜嗚В绠楁硶鍙婂叾鎺ㄧ悊杩囩▼锛屽彲浠ョ湅鍘诲師鐗堢殑璁烘枃锛岄摼鎺ュ湴鍧€锛?code>http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf銆?/strong>
灏忕粨
褰撻渶瑕佸仛澶ч噺鏁版嵁缁熻鏃讹紝鏅€氱殑闆嗗悎绫诲瀷宸茬粡涓嶈兘婊¤冻鎴戜滑鐨勯渶姹備簡锛岃繖涓椂鍊欐垜浠彲浠ュ€熷姪 Redis 2.8.9 涓彁渚涚殑 HyperLogLog 鏉ョ粺璁★紝瀹冪殑浼樼偣鏄彧闇€瑕佷娇鐢?12k 鐨勭┖闂村氨鑳界粺璁?2^64 鐨勬暟鎹紝浣嗗畠鐨勭己鐐规槸瀛樺湪 0.81% 鐨勮宸紝HyperLogLog 鎻愪緵浜嗕笁涓搷浣滄柟娉曪細pfadd 娣诲姞鍏冪礌銆乸fcount 缁熻鍏冪礌鍜?pfmerge 鍚堝苟鍏冪礌銆?/strong>
以上是关于11. 浼樼鐨勫熀鏁扮粺璁$畻娉?-HyperLogLog的主要内容,如果未能解决你的问题,请参考以下文章