map源码解析
Posted lkness
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了map源码解析相关的知识,希望对你有一定的参考价值。
map介绍
map是golang内置常用的哈希表结构,可以多线程读,但多线程一旦写操作就会程序崩溃。
源码解析(go1.19.3:src/runtime/map.go)
map结构定义
type hmap struct
// Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.
// Make sure this stays in sync with the compiler's definition.
count int // # live cells == size of map. Must be first (used by len() builtin)
flags uint8
B uint8 // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
hash0 uint32 // hash seed
buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
nevacuate uintptr // progress counter for evacuation (buckets less than this have been evacuated)
extra *mapextra // optional fields
// mapextra holds fields that are not present on all maps.
type mapextra struct
// If both key and elem do not contain pointers and are inline, then we mark bucket
// type as containing no pointers. This avoids scanning such maps.
// However, bmap.overflow is a pointer. In order to keep overflow buckets
// alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
// overflow and oldoverflow are only used if key and elem do not contain pointers.
// overflow contains overflow buckets for hmap.buckets.
// oldoverflow contains overflow buckets for hmap.oldbuckets.
// The indirection allows to store a pointer to the slice in hiter.
overflow *[]*bmap
oldoverflow *[]*bmap
// nextOverflow holds a pointer to a free overflow bucket.
nextOverflow *bmap
// A bucket for a Go map.
type bmap struct
// tophash generally contains the top byte of the hash value
// for each key in this bucket. If tophash[0] < minTopHash,
// tophash[0] is a bucket evacuation state instead.
tophash [bucketCnt]uint8
// Followed by bucketCnt keys and then bucketCnt elems.
// NOTE: packing all the keys together and then all the elems together makes the
// code a bit more complicated than alternating key/elem/key/elem/... but it allows
// us to eliminate padding which would be needed for, e.g., map[int64]int8.
// Followed by an overflow pointer.
hmap
就是map的底层结构。key通过hash与桶数组buckets长度取模得到具体桶,再遍历桶上溢出桶链表查找冲突的key,其结构如下:
hmap.buckets
hmap的通过数组buckets来记录key,将key哈希之后与数组长度取模就知道key存放在哪个数组索引。如果元素数量过多,数组长度影响查找效率了,还会触发增量扩容,将数组长度扩大一倍。具体的一个数组元素设计为bmap
。
bmap
bmap
用来解决hash冲突,可以存放hash值一样的8个key-value对。那如果8个不够了,bmap里还有个next指针指向再创建的bmap,这样其实就是开放寻址法+链式法解决了hash冲突。
每个bmap的设计也是一个连续内存,可以存放8个key-value对,其内存结构为sizeof(uint8)*8+sizeof(ptr)*8+sizeof(ptr)*8+sizeof(ptr)*1:8个uint8存放8个key的高位hash值;接着8个指针存放8个key;再接着8个指针存放8个value指针;最后一个指针存放next。于是通过bmap指针做地址偏移就能做对应的查找,例如查找key时查找高位hash如果匹配就将bucket地址偏移sizeof(ptr)*8定位到key的区间再来详细对比和后续取值。
mapextra
overflow:存储所有当前数组桶上的溢出桶bmap,因为hmap上链表链接的溢出桶是unsafe.Pointer,会被golang的gc逻辑标记为未使用,因此这里append了所有溢出桶,标记为使用中
oldoverflow:同overflow一样,保存了增量扩容期间的老溢出桶,防止gc清理
nextOverflow:预分配的溢出桶,访问量大时提前分配好。每当增量扩容(桶数组扩大一倍)时,调用makeBucketArray创建新桶数组会预先分配好要用的溢出桶放在nextOverflow中,方便插入值的时候快速取用。
扩容
增量扩容
增量扩容是防止桶数组太短,每个桶后续溢出桶链表过长,导致查询效率低下,于是直接将桶数组长度翻倍(hmap.B++),记录数组桶长度翻倍就退出扩容判断逻辑,不做具体的迁移key逻辑。此时新桶数组长度是老桶数组长度的2倍,由每次写操作(插入值或者删除值)来真正执行扩容,key的查询会落到具体某个新的和老的数组索引上,然后触发该新老索引的桶的key迁移。这种将每个桶的扩容分散到每次写操作中,避免了全量扩容引起性能抖动,并且查询时直接根据扩容状态就能将key哈希模老的长度还是新的长度得到桶数组,查询效率也不受影响。
等量扩容
等量扩容是防止delete操作太多,每个索引后的溢出桶链表节点上都是空值,桶数组长度不变,重新紧凑排布一下链表上的kv。触发等量扩容时标记map为等量扩容,也不做具体迁移key逻辑。由每次写操作(插入值或者删除值)来真正执行扩容,key的查询会落到具体某个数组索引上,然后触发该索引的桶的key扩容。避免了全量扩容引起性能抖动。
hmap字段解析
count:当前map的元素个数
flags:map的状态标记
B:数组桶的长度系数,2^B表示当前桶长度,用来做key应该落到哪个桶数组索引上以及判断是否可以增量扩容,每次增量扩容就是B++
noverflow:所有链表上的桶数量总和,用来判断是否可以等量扩容。但当桶数组长度为2^16以上时是个估计值(每次溢出桶增加一个时1/8概率计数加一,防止map数量大时总是触发等量扩容),不是确切值
hash0:随机种子,内部做一些概率随机
buckets:当前数组桶
oldbuckets:扩容中的老数组桶,如果为空表示map没有触发扩容
nevacuate:扩容中标记完成扩容的最大桶索引,小于这个索引的数组上的桶都完成了扩容迁移
extra:参考mapextra结构描述
创建map
func makemap(t *maptype, hint int, h *hmap) *hmap
mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
if overflow || mem > maxAlloc
hint = 0
// initialize Hmap
if h == nil
h = new(hmap)
h.hash0 = fastrand()
// Find the size parameter B which will hold the requested # of elements.
// For hint < 0 overLoadFactor returns false since hint < bucketCnt.
B := uint8(0)
for overLoadFactor(hint, B)
B++
h.B = B
// allocate initial hash table
// if B == 0, the buckets field is allocated lazily later (in mapassign)
// If hint is large zeroing this memory could take a while.
if h.B != 0
var nextOverflow *bmap
h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
if nextOverflow != nil
h.extra = new(mapextra)
h.extra.nextOverflow = nextOverflow
return h
1.校验初始给的长度是否不合法
2.使用overLoadFactor(元素个数大于2^数组桶*6.5)方法判断是否要增量扩容,是就B++增量扩容
3.如果B大于0(步骤2触发了增量扩容),调用makeBucketArray创建好2^B个数组桶,并且makeBucketArray还会提前创建好未来要用到的溢出桶nextOverflow
查询key
func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer
if raceenabled && h != nil
callerpc := getcallerpc()
pc := abi.FuncPCABIInternal(mapaccess1)
racereadpc(unsafe.Pointer(h), callerpc, pc)
raceReadObjectPC(t.key, key, callerpc, pc)
if msanenabled && h != nil
msanread(key, t.key.size)
if asanenabled && h != nil
asanread(key, t.key.size)
if h == nil || h.count == 0
if t.hashMightPanic()
t.hasher(key, 0) // see issue 23734
return unsafe.Pointer(&zeroVal[0])
if h.flags&hashWriting != 0
fatal("concurrent map read and map write")
hash := t.hasher(key, uintptr(h.hash0))
m := bucketMask(h.B)
b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
if c := h.oldbuckets; c != nil
if !h.sameSizeGrow()
// There used to be half as many buckets; mask down one more power of two.
m >>= 1
oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
if !evacuated(oldb)
b = oldb
top := tophash(hash)
bucketloop:
for ; b != nil; b = b.overflow(t)
for i := uintptr(0); i < bucketCnt; i++
if b.tophash[i] != top
if b.tophash[i] == emptyRest
break bucketloop
continue
k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
if t.indirectkey()
k = *((*unsafe.Pointer)(k))
if t.key.equal(key, k)
e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
if t.indirectelem()
e = *((*unsafe.Pointer)(e))
return e
return unsafe.Pointer(&zeroVal[0])
1.如果map的状态有hashWriting,直接崩溃程序,这里的fatal不同于panic,无法被捕获或者打印栈帧
2.计算key的hash值并与桶数组长度取模得到key所处桶(hash方法为golang内置,可以支持所有类型的数据hash,如果是结构体、结构体指针类型会遍历获取字段指来迭代hash值)
3.如果map处于扩容状态(h.oldbuckets!=nil),就根据增量扩容或者等量扩容重新定位key的所处桶(新桶或者老桶)
4.遍历当前桶后边的所有溢出桶链表,找到与key相等的值并返回value
写入kv
func reflect_mapassign(t *maptype, h *hmap, key unsafe.Pointer, elem unsafe.Pointer)
p := mapassign(t, h, key)
typedmemmove(t.elem, p, elem)
// Like mapaccess, but allocates a slot for the key if it is not present in the map.
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer
if h == nil
panic(plainError("assignment to entry in nil map"))
if raceenabled
callerpc := getcallerpc()
pc := abi.FuncPCABIInternal(mapassign)
racewritepc(unsafe.Pointer(h), callerpc, pc)
raceReadObjectPC(t.key, key, callerpc, pc)
if msanenabled
msanread(key, t.key.size)
if asanenabled
asanread(key, t.key.size)
if h.flags&hashWriting != 0
fatal("concurrent map writes")
hash := t.hasher(key, uintptr(h.hash0))
// Set hashWriting after calling t.hasher, since t.hasher may panic,
// in which case we have not actually done a write.
h.flags ^= hashWriting
if h.buckets == nil
h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
again:
bucket := hash & bucketMask(h.B)
if h.growing()
growWork(t, h, bucket)
b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
top := tophash(hash)
var inserti *uint8
var insertk unsafe.Pointer
var elem unsafe.Pointer
bucketloop:
for
for i := uintptr(0); i < bucketCnt; i++
if b.tophash[i] != top
if isEmpty(b.tophash[i]) && inserti == nil
inserti = &b.tophash[i]
insert#yyds干货盘点# Map - LinkedHashSet&Map源码解析
Map - LinkedHashSet&Map源码解析
事实上LinkedHashMap是HashMap的直接子类,二者唯一的区别是LinkedHashMap在HashMap的基础上,采用双向链表(doubly-linked list)的形式将所有entry连接起来,这样是为保证元素的迭代顺序跟插入顺序相同。上图给出了LinkedHashMap的结构图,主体部分跟HashMap完全一样,多了header指向双向链表的头部(是一个哑元),该双向链表的迭代顺序就是entry的插入顺序。
除了可以保迭代历顺序,这种结构还有一个好处 : 迭代LinkedHashMap时不需要像HashMap那样遍历整个table,而只需要直接遍历header指向的双向链表即可,也就是说LinkedHashMap的迭代时间就只跟entry的个数相关,而跟table的大小无关。
有两个参数可以影响LinkedHashMap的性能: 初始容量(inital capacity)和负载系数(load factor)。初始容量指定了初始table的大小,负载系数用来指定自动扩容的临界值。当entry的数量超过capacity*load_factor时,容器将自动扩容并重新哈希。对于插入元素较多的场景,将初始容量设大可以减少重新哈希的次数。
将对象放入到LinkedHashMap或LinkedHashSet中时,有两个方法需要特别关心: hashCode()和equals()。hashCode()方法决定了对象会被放到哪个bucket里,当多个对象的哈希值冲突时,equals()方法决定了这些对象是否是“同一个对象”。所以,如果要将自定义的对象放入到LinkedHashMap或LinkedHashSet中,需要@Override hashCode()和equals()方法。
方法剖析
get()
get(Object key)方法根据指定的key值返回对应的value。该方法跟HashMap.get()方法的流程几乎完全一样,读者可自行参考前文 (opens new window),这里不再赘述。
put()
put(K key, V value)方法是将指定的key, value对添加到map里。该方法首先会对map做一次查找,看是否包含该元组,如果已经包含则直接返回,查找过程类似于get()方法;如果没有找到,则会通过addEntry(int hash, K key, V value, int bucketIndex)方法插入新的entry。
// LinkedHashMap.addEntry()
void addEntry(int hash, K key, V value, int bucketIndex)
if ((size >= threshold) && (null != table[bucketIndex]))
resize(2 * table.length);// 自动扩容,并重新哈希
hash = (null != key) ? hash(key) : 0;
bucketIndex = hash & (table.length-1);// hash%table.length
// 1.在冲突链表头部插入新的entry
HashMap.Entry<K,V> old = table[bucketIndex];
Entry<K,V> e = new Entry<>(hash, key, value, old);
table[bucketIndex] = e;
// 2.在双向链表的尾部插入新的entry
e.addBefore(header);
size++;
// LinkedHashMap.Entry.addBefor(),将this插入到existingEntry的前面
private void addBefore(Entry<K,V> existingEntry)
after = existingEntry;
before = existingEntry.before;
before.after = this;
after.before = this;
remove()
remove(Object key)的作用是删除key值对应的entry,该方法的具体逻辑是在removeEntryForKey(Object key)里实现的。removeEntryForKey()方法会首先找到key值对应的entry,然后删除该entry(修改链表的相应引用)。查找过程跟get()方法类似。
// LinkedHashMap.removeEntryForKey(),删除key值对应的entry
final Entry<K,V> removeEntryForKey(Object key)
......
int hash = (key == null) ? 0 : hash(key);
int i = indexFor(hash, table.length);// hash&(table.length-1)
Entry<K,V> prev = table[i];// 得到冲突链表
Entry<K,V> e = prev;
while (e != null) // 遍历冲突链表
Entry<K,V> next = e.next;
Object k;
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k)))) // 找到要删除的entry
modCount++; size--;
// 1. 将e从对应bucket的冲突链表中删除
if (prev == e) table[i] = next;
else prev.next = next;
// 2. 将e从双向链表中删除
e.before.after = e.after;
e.after.before = e.before;
return e;
prev = e; e = next;
return e;
LinkedHashSet
前面已经说过LinkedHashSet是对LinkedHashMap的简单包装,对LinkedHashSet的函数调用都会转换成合适的LinkedHashMap方法。
public class LinkedHashSet<E>
extends HashSet<E>
implements Set<E>, Cloneable, java.io.Serializable
......
// LinkedHashSet里面有一个LinkedHashMap
public LinkedHashSet(int initialCapacity, float loadFactor)
map = new LinkedHashMap<>(initialCapacity, loadFactor);
......
public boolean add(E e) //简单的方法转换
return map.put(e, PRESENT)==null;
......
以上是关于map源码解析的主要内容,如果未能解决你的问题,请参考以下文章