map源码解析

Posted 2023-04-02 lkness

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了map源码解析相关的知识，希望对你有一定的参考价值。

map介绍

map是golang内置常用的哈希表结构，可以多线程读，但多线程一旦写操作就会程序崩溃。

源码解析(go1.19.3:src/runtime/map.go)

map结构定义

type hmap struct 
	// Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.
	// Make sure this stays in sync with the compiler's definition.
	count     int // # live cells == size of map.  Must be first (used by len() builtin)
	flags     uint8
	B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
	noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
	hash0     uint32 // hash seed

	buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
	oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
	nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

	extra *mapextra // optional fields

// mapextra holds fields that are not present on all maps.
type mapextra struct 
	// If both key and elem do not contain pointers and are inline, then we mark bucket
	// type as containing no pointers. This avoids scanning such maps.
	// However, bmap.overflow is a pointer. In order to keep overflow buckets
	// alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
	// overflow and oldoverflow are only used if key and elem do not contain pointers.
	// overflow contains overflow buckets for hmap.buckets.
	// oldoverflow contains overflow buckets for hmap.oldbuckets.
	// The indirection allows to store a pointer to the slice in hiter.
	overflow    *[]*bmap
	oldoverflow *[]*bmap

	// nextOverflow holds a pointer to a free overflow bucket.
	nextOverflow *bmap

// A bucket for a Go map.
type bmap struct 
	// tophash generally contains the top byte of the hash value
	// for each key in this bucket. If tophash[0] < minTopHash,
	// tophash[0] is a bucket evacuation state instead.
	tophash [bucketCnt]uint8
	// Followed by bucketCnt keys and then bucketCnt elems.
	// NOTE: packing all the keys together and then all the elems together makes the
	// code a bit more complicated than alternating key/elem/key/elem/... but it allows
	// us to eliminate padding which would be needed for, e.g., map[int64]int8.
	// Followed by an overflow pointer.

hmap就是map的底层结构。key通过hash与桶数组buckets长度取模得到具体桶，再遍历桶上溢出桶链表查找冲突的key，其结构如下：

hmap.buckets

hmap的通过数组buckets来记录key，将key哈希之后与数组长度取模就知道key存放在哪个数组索引。如果元素数量过多，数组长度影响查找效率了，还会触发增量扩容，将数组长度扩大一倍。具体的一个数组元素设计为bmap。

bmap

bmap用来解决hash冲突，可以存放hash值一样的8个key-value对。那如果8个不够了，bmap里还有个next指针指向再创建的bmap，这样其实就是开放寻址法+链式法解决了hash冲突。
每个bmap的设计也是一个连续内存，可以存放8个key-value对，其内存结构为sizeof(uint8)*8+sizeof(ptr)*8+sizeof(ptr)*8+sizeof(ptr)*1：8个uint8存放8个key的高位hash值；接着8个指针存放8个key；再接着8个指针存放8个value指针；最后一个指针存放next。于是通过bmap指针做地址偏移就能做对应的查找，例如查找key时查找高位hash如果匹配就将bucket地址偏移sizeof(ptr)*8定位到key的区间再来详细对比和后续取值。

mapextra

overflow:存储所有当前数组桶上的溢出桶bmap，因为hmap上链表链接的溢出桶是unsafe.Pointer，会被golang的gc逻辑标记为未使用，因此这里append了所有溢出桶，标记为使用中
oldoverflow:同overflow一样，保存了增量扩容期间的老溢出桶，防止gc清理
nextOverflow:预分配的溢出桶，访问量大时提前分配好。每当增量扩容（桶数组扩大一倍）时，调用makeBucketArray创建新桶数组会预先分配好要用的溢出桶放在nextOverflow中，方便插入值的时候快速取用。

扩容

增量扩容

增量扩容是防止桶数组太短，每个桶后续溢出桶链表过长，导致查询效率低下，于是直接将桶数组长度翻倍（hmap.B++），记录数组桶长度翻倍就退出扩容判断逻辑，不做具体的迁移key逻辑。此时新桶数组长度是老桶数组长度的2倍，由每次写操作（插入值或者删除值）来真正执行扩容，key的查询会落到具体某个新的和老的数组索引上，然后触发该新老索引的桶的key迁移。这种将每个桶的扩容分散到每次写操作中，避免了全量扩容引起性能抖动，并且查询时直接根据扩容状态就能将key哈希模老的长度还是新的长度得到桶数组，查询效率也不受影响。

等量扩容

等量扩容是防止delete操作太多，每个索引后的溢出桶链表节点上都是空值，桶数组长度不变，重新紧凑排布一下链表上的kv。触发等量扩容时标记map为等量扩容，也不做具体迁移key逻辑。由每次写操作（插入值或者删除值）来真正执行扩容，key的查询会落到具体某个数组索引上，然后触发该索引的桶的key扩容。避免了全量扩容引起性能抖动。

hmap字段解析

count:当前map的元素个数
flags:map的状态标记
B:数组桶的长度系数，2^B表示当前桶长度，用来做key应该落到哪个桶数组索引上以及判断是否可以增量扩容，每次增量扩容就是B++
noverflow:所有链表上的桶数量总和，用来判断是否可以等量扩容。但当桶数组长度为2^16以上时是个估计值（每次溢出桶增加一个时1/8概率计数加一，防止map数量大时总是触发等量扩容），不是确切值
hash0:随机种子，内部做一些概率随机
buckets:当前数组桶
oldbuckets:扩容中的老数组桶，如果为空表示map没有触发扩容
nevacuate:扩容中标记完成扩容的最大桶索引，小于这个索引的数组上的桶都完成了扩容迁移
extra:参考mapextra结构描述

创建map

func makemap(t *maptype, hint int, h *hmap) *hmap 
	mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
	if overflow || mem > maxAlloc 
		hint = 0
	

	// initialize Hmap
	if h == nil 
		h = new(hmap)
	
	h.hash0 = fastrand()

	// Find the size parameter B which will hold the requested # of elements.
	// For hint < 0 overLoadFactor returns false since hint < bucketCnt.
	B := uint8(0)
	for overLoadFactor(hint, B) 
		B++
	
	h.B = B

	// allocate initial hash table
	// if B == 0, the buckets field is allocated lazily later (in mapassign)
	// If hint is large zeroing this memory could take a while.
	if h.B != 0 
		var nextOverflow *bmap
		h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
		if nextOverflow != nil 
			h.extra = new(mapextra)
			h.extra.nextOverflow = nextOverflow
		
	

	return h

1.校验初始给的长度是否不合法
2.使用overLoadFactor（元素个数大于2^数组桶*6.5）方法判断是否要增量扩容，是就B++增量扩容
3.如果B大于0（步骤2触发了增量扩容），调用makeBucketArray创建好2^B个数组桶，并且makeBucketArray还会提前创建好未来要用到的溢出桶nextOverflow

查询key

func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer 
	if raceenabled && h != nil 
		callerpc := getcallerpc()
		pc := abi.FuncPCABIInternal(mapaccess1)
		racereadpc(unsafe.Pointer(h), callerpc, pc)
		raceReadObjectPC(t.key, key, callerpc, pc)
	
	if msanenabled && h != nil 
		msanread(key, t.key.size)
	
	if asanenabled && h != nil 
		asanread(key, t.key.size)
	
	if h == nil || h.count == 0 
		if t.hashMightPanic() 
			t.hasher(key, 0) // see issue 23734
		
		return unsafe.Pointer(&zeroVal[0])
	
	if h.flags&hashWriting != 0 
		fatal("concurrent map read and map write")
	
	hash := t.hasher(key, uintptr(h.hash0))
	m := bucketMask(h.B)
	b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
	if c := h.oldbuckets; c != nil 
		if !h.sameSizeGrow() 
			// There used to be half as many buckets; mask down one more power of two.
			m >>= 1
		
		oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
		if !evacuated(oldb) 
			b = oldb
		
	
	top := tophash(hash)
bucketloop:
	for ; b != nil; b = b.overflow(t) 
		for i := uintptr(0); i < bucketCnt; i++ 
			if b.tophash[i] != top 
				if b.tophash[i] == emptyRest 
					break bucketloop
				
				continue
			
			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
			if t.indirectkey() 
				k = *((*unsafe.Pointer)(k))
			
			if t.key.equal(key, k) 
				e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
				if t.indirectelem() 
					e = *((*unsafe.Pointer)(e))
				
				return e
			
		
	
	return unsafe.Pointer(&zeroVal[0])

1.如果map的状态有hashWriting，直接崩溃程序，这里的fatal不同于panic，无法被捕获或者打印栈帧
2.计算key的hash值并与桶数组长度取模得到key所处桶（hash方法为golang内置，可以支持所有类型的数据hash，如果是结构体、结构体指针类型会遍历获取字段指来迭代hash值）
3.如果map处于扩容状态（h.oldbuckets!=nil），就根据增量扩容或者等量扩容重新定位key的所处桶（新桶或者老桶）
4.遍历当前桶后边的所有溢出桶链表，找到与key相等的值并返回value

写入kv

func reflect_mapassign(t *maptype, h *hmap, key unsafe.Pointer, elem unsafe.Pointer) 
	p := mapassign(t, h, key)
	typedmemmove(t.elem, p, elem)

// Like mapaccess, but allocates a slot for the key if it is not present in the map.
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer 
	if h == nil 
		panic(plainError("assignment to entry in nil map"))
	
	if raceenabled 
		callerpc := getcallerpc()
		pc := abi.FuncPCABIInternal(mapassign)
		racewritepc(unsafe.Pointer(h), callerpc, pc)
		raceReadObjectPC(t.key, key, callerpc, pc)
	
	if msanenabled 
		msanread(key, t.key.size)
	
	if asanenabled 
		asanread(key, t.key.size)
	
	if h.flags&hashWriting != 0 
		fatal("concurrent map writes")
	
	hash := t.hasher(key, uintptr(h.hash0))

	// Set hashWriting after calling t.hasher, since t.hasher may panic,
	// in which case we have not actually done a write.
	h.flags ^= hashWriting

	if h.buckets == nil 
		h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
	

again:
	bucket := hash & bucketMask(h.B)
	if h.growing() 
		growWork(t, h, bucket)
	
	b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
	top := tophash(hash)

	var inserti *uint8
	var insertk unsafe.Pointer
	var elem unsafe.Pointer
bucketloop:
	for 
		for i := uintptr(0); i < bucketCnt; i++ 
			if b.tophash[i] != top 
				if isEmpty(b.tophash[i]) && inserti == nil 
					inserti = &b.tophash[i]
					insert
#yyds干货盘点# Map - LinkedHashSet&Map源码解析


                Map - LinkedHashSet&Map源码解析
事实上LinkedHashMap是HashMap的直接子类，二者唯一的区别是LinkedHashMap在HashMap的基础上，采用双向链表(doubly-linked list)的形式将所有entry连接起来，这样是为保证元素的迭代顺序跟插入顺序相同。上图给出了LinkedHashMap的结构图，主体部分跟HashMap完全一样，多了header指向双向链表的头部(是一个哑元)，该双向链表的迭代顺序就是entry的插入顺序。
除了可以保迭代历顺序，这种结构还有一个好处 : 迭代LinkedHashMap时不需要像HashMap那样遍历整个table，而只需要直接遍历header指向的双向链表即可，也就是说LinkedHashMap的迭代时间就只跟entry的个数相关，而跟table的大小无关。
有两个参数可以影响LinkedHashMap的性能: 初始容量(inital capacity)和负载系数(load factor)。初始容量指定了初始table的大小，负载系数用来指定自动扩容的临界值。当entry的数量超过capacity*load_factor时，容器将自动扩容并重新哈希。对于插入元素较多的场景，将初始容量设大可以减少重新哈希的次数。
将对象放入到LinkedHashMap或LinkedHashSet中时，有两个方法需要特别关心: hashCode()和equals()。hashCode()方法决定了对象会被放到哪个bucket里，当多个对象的哈希值冲突时，equals()方法决定了这些对象是否是“同一个对象”。所以，如果要将自定义的对象放入到LinkedHashMap或LinkedHashSet中，需要@Override hashCode()和equals()方法。

方法剖析
get()
get(Object key)方法根据指定的key值返回对应的value。该方法跟HashMap.get()方法的流程几乎完全一样，读者可自行参考前文  (opens new window)，这里不再赘述。
put()
put(K key, V value)方法是将指定的key, value对添加到map里。该方法首先会对map做一次查找，看是否包含该元组，如果已经包含则直接返回，查找过程类似于get()方法；如果没有找到，则会通过addEntry(int hash, K key, V value, int bucketIndex)方法插入新的entry。
// LinkedHashMap.addEntry()
void addEntry(int hash, K key, V value, int bucketIndex) 
    if ((size >= threshold) && (null != table[bucketIndex])) 
        resize(2 * table.length);// 自动扩容，并重新哈希
        hash = (null != key) ? hash(key) : 0;
        bucketIndex = hash & (table.length-1);// hash%table.length
    
    // 1.在冲突链表头部插入新的entry
    HashMap.Entry<K,V> old = table[bucketIndex];
    Entry<K,V> e = new Entry<>(hash, key, value, old);
    table[bucketIndex] = e;
    // 2.在双向链表的尾部插入新的entry
    e.addBefore(header);
    size++;
// LinkedHashMap.Entry.addBefor()，将this插入到existingEntry的前面
private void addBefore(Entry<K,V> existingEntry) 
    after  = existingEntry;
    before = existingEntry.before;
    before.after = this;
    after.before = this;

remove()
remove(Object key)的作用是删除key值对应的entry，该方法的具体逻辑是在removeEntryForKey(Object key)里实现的。removeEntryForKey()方法会首先找到key值对应的entry，然后删除该entry(修改链表的相应引用)。查找过程跟get()方法类似。
// LinkedHashMap.removeEntryForKey()，删除key值对应的entry
final Entry<K,V> removeEntryForKey(Object key) 
    ......
    int hash = (key == null) ? 0 : hash(key);
    int i = indexFor(hash, table.length);// hash&(table.length-1)
    Entry<K,V> prev = table[i];// 得到冲突链表
    Entry<K,V> e = prev;
    while (e != null) // 遍历冲突链表
        Entry<K,V> next = e.next;
        Object k;
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k)))) // 找到要删除的entry
            modCount++; size--;
            // 1. 将e从对应bucket的冲突链表中删除
            if (prev == e) table[i] = next;
            else prev.next = next;
            // 2. 将e从双向链表中删除
            e.before.after = e.after;
            e.after.before = e.before;
            return e;
        
        prev = e; e = next;
    
    return e;
LinkedHashSet
前面已经说过LinkedHashSet是对LinkedHashMap的简单包装，对LinkedHashSet的函数调用都会转换成合适的LinkedHashMap方法。
public class LinkedHashSet<E>
    extends HashSet<E>
    implements Set<E>, Cloneable, java.io.Serializable 
    ......
    // LinkedHashSet里面有一个LinkedHashMap
    public LinkedHashSet(int initialCapacity, float loadFactor) 
        map = new LinkedHashMap<>(initialCapacity, loadFactor);
    
    ......
    public boolean add(E e) //简单的方法转换
        return map.put(e, PRESENT)==null;
    
    ......
以上是关于map源码解析的主要内容，如果未能解决你的问题，请参考以下文章