map源码解析

Posted lkness

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了map源码解析相关的知识,希望对你有一定的参考价值。

map介绍

map是golang内置常用的哈希表结构,可以多线程读,但多线程一旦写操作就会程序崩溃。

源码解析(go1.19.3:src/runtime/map.go)

map结构定义

type hmap struct 
	// Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.
	// Make sure this stays in sync with the compiler's definition.
	count     int // # live cells == size of map.  Must be first (used by len() builtin)
	flags     uint8
	B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
	noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
	hash0     uint32 // hash seed

	buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
	oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
	nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

	extra *mapextra // optional fields

// mapextra holds fields that are not present on all maps.
type mapextra struct 
	// If both key and elem do not contain pointers and are inline, then we mark bucket
	// type as containing no pointers. This avoids scanning such maps.
	// However, bmap.overflow is a pointer. In order to keep overflow buckets
	// alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
	// overflow and oldoverflow are only used if key and elem do not contain pointers.
	// overflow contains overflow buckets for hmap.buckets.
	// oldoverflow contains overflow buckets for hmap.oldbuckets.
	// The indirection allows to store a pointer to the slice in hiter.
	overflow    *[]*bmap
	oldoverflow *[]*bmap

	// nextOverflow holds a pointer to a free overflow bucket.
	nextOverflow *bmap

// A bucket for a Go map.
type bmap struct 
	// tophash generally contains the top byte of the hash value
	// for each key in this bucket. If tophash[0] < minTopHash,
	// tophash[0] is a bucket evacuation state instead.
	tophash [bucketCnt]uint8
	// Followed by bucketCnt keys and then bucketCnt elems.
	// NOTE: packing all the keys together and then all the elems together makes the
	// code a bit more complicated than alternating key/elem/key/elem/... but it allows
	// us to eliminate padding which would be needed for, e.g., map[int64]int8.
	// Followed by an overflow pointer.

hmap就是map的底层结构。key通过hash与桶数组buckets长度取模得到具体桶,再遍历桶上溢出桶链表查找冲突的key,其结构如下:

溢出桶n 溢出桶2 溢出桶1 数组桶 2^B个 数组结构存放8个kv对,next指针指向下个溢出桶 溢出桶 数组结构存放8个kv对,next指针指向下个溢出桶 溢出桶 数组结构存放8个kv对,next指针指向下个溢出桶 溢出桶 桶2^B 桶7 桶6 桶5 桶4 桶3 桶2 桶1

hmap.buckets

hmap的通过数组buckets来记录key,将key哈希之后与数组长度取模就知道key存放在哪个数组索引。如果元素数量过多,数组长度影响查找效率了,还会触发增量扩容,将数组长度扩大一倍。具体的一个数组元素设计为bmap

bmap

bmap用来解决hash冲突,可以存放hash值一样的8个key-value对。那如果8个不够了,bmap里还有个next指针指向再创建的bmap,这样其实就是开放寻址法+链式法解决了hash冲突。
每个bmap的设计也是一个连续内存,可以存放8个key-value对,其内存结构为sizeof(uint8)*8+sizeof(ptr)*8+sizeof(ptr)*8+sizeof(ptr)*1:8个uint8存放8个key的高位hash值;接着8个指针存放8个key;再接着8个指针存放8个value指针;最后一个指针存放next。于是通过bmap指针做地址偏移就能做对应的查找,例如查找key时查找高位hash如果匹配就将bucket地址偏移sizeof(ptr)*8定位到key的区间再来详细对比和后续取值。

mapextra

overflow:存储所有当前数组桶上的溢出桶bmap,因为hmap上链表链接的溢出桶是unsafe.Pointer,会被golang的gc逻辑标记为未使用,因此这里append了所有溢出桶,标记为使用中
oldoverflow:同overflow一样,保存了增量扩容期间的老溢出桶,防止gc清理
nextOverflow:预分配的溢出桶,访问量大时提前分配好。每当增量扩容(桶数组扩大一倍)时,调用makeBucketArray创建新桶数组会预先分配好要用的溢出桶放在nextOverflow中,方便插入值的时候快速取用。

扩容

增量扩容

增量扩容是防止桶数组太短,每个桶后续溢出桶链表过长,导致查询效率低下,于是直接将桶数组长度翻倍(hmap.B++),记录数组桶长度翻倍就退出扩容判断逻辑,不做具体的迁移key逻辑。此时新桶数组长度是老桶数组长度的2倍,由每次写操作(插入值或者删除值)来真正执行扩容,key的查询会落到具体某个新的和老的数组索引上,然后触发该新老索引的桶的key迁移。这种将每个桶的扩容分散到每次写操作中,避免了全量扩容引起性能抖动,并且查询时直接根据扩容状态就能将key哈希模老的长度还是新的长度得到桶数组,查询效率也不受影响。

等量扩容

等量扩容是防止delete操作太多,每个索引后的溢出桶链表节点上都是空值,桶数组长度不变,重新紧凑排布一下链表上的kv。触发等量扩容时标记map为等量扩容,也不做具体迁移key逻辑。由每次写操作(插入值或者删除值)来真正执行扩容,key的查询会落到具体某个数组索引上,然后触发该索引的桶的key扩容。避免了全量扩容引起性能抖动。

hmap字段解析

count:当前map的元素个数
flags:map的状态标记
B:数组桶的长度系数,2^B表示当前桶长度,用来做key应该落到哪个桶数组索引上以及判断是否可以增量扩容,每次增量扩容就是B++
noverflow:所有链表上的桶数量总和,用来判断是否可以等量扩容。但当桶数组长度为2^16以上时是个估计值(每次溢出桶增加一个时1/8概率计数加一,防止map数量大时总是触发等量扩容),不是确切值
hash0:随机种子,内部做一些概率随机
buckets:当前数组桶
oldbuckets:扩容中的老数组桶,如果为空表示map没有触发扩容
nevacuate:扩容中标记完成扩容的最大桶索引,小于这个索引的数组上的桶都完成了扩容迁移
extra:参考mapextra结构描述

创建map

func makemap(t *maptype, hint int, h *hmap) *hmap 
	mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
	if overflow || mem > maxAlloc 
		hint = 0
	

	// initialize Hmap
	if h == nil 
		h = new(hmap)
	
	h.hash0 = fastrand()

	// Find the size parameter B which will hold the requested # of elements.
	// For hint < 0 overLoadFactor returns false since hint < bucketCnt.
	B := uint8(0)
	for overLoadFactor(hint, B) 
		B++
	
	h.B = B

	// allocate initial hash table
	// if B == 0, the buckets field is allocated lazily later (in mapassign)
	// If hint is large zeroing this memory could take a while.
	if h.B != 0 
		var nextOverflow *bmap
		h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
		if nextOverflow != nil 
			h.extra = new(mapextra)
			h.extra.nextOverflow = nextOverflow
		
	

	return h

1.校验初始给的长度是否不合法
2.使用overLoadFactor(元素个数大于2^数组桶*6.5)方法判断是否要增量扩容,是就B++增量扩容
3.如果B大于0(步骤2触发了增量扩容),调用makeBucketArray创建好2^B个数组桶,并且makeBucketArray还会提前创建好未来要用到的溢出桶nextOverflow

查询key

func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer 
	if raceenabled && h != nil 
		callerpc := getcallerpc()
		pc := abi.FuncPCABIInternal(mapaccess1)
		racereadpc(unsafe.Pointer(h), callerpc, pc)
		raceReadObjectPC(t.key, key, callerpc, pc)
	
	if msanenabled && h != nil 
		msanread(key, t.key.size)
	
	if asanenabled && h != nil 
		asanread(key, t.key.size)
	
	if h == nil || h.count == 0 
		if t.hashMightPanic() 
			t.hasher(key, 0) // see issue 23734
		
		return unsafe.Pointer(&zeroVal[0])
	
	if h.flags&hashWriting != 0 
		fatal("concurrent map read and map write")
	
	hash := t.hasher(key, uintptr(h.hash0))
	m := bucketMask(h.B)
	b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
	if c := h.oldbuckets; c != nil 
		if !h.sameSizeGrow() 
			// There used to be half as many buckets; mask down one more power of two.
			m >>= 1
		
		oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
		if !evacuated(oldb) 
			b = oldb
		
	
	top := tophash(hash)
bucketloop:
	for ; b != nil; b = b.overflow(t) 
		for i := uintptr(0); i < bucketCnt; i++ 
			if b.tophash[i] != top 
				if b.tophash[i] == emptyRest 
					break bucketloop
				
				continue
			
			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
			if t.indirectkey() 
				k = *((*unsafe.Pointer)(k))
			
			if t.key.equal(key, k) 
				e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
				if t.indirectelem() 
					e = *((*unsafe.Pointer)(e))
				
				return e
			
		
	
	return unsafe.Pointer(&zeroVal[0])

1.如果map的状态有hashWriting,直接崩溃程序,这里的fatal不同于panic,无法被捕获或者打印栈帧
2.计算key的hash值并与桶数组长度取模得到key所处桶(hash方法为golang内置,可以支持所有类型的数据hash,如果是结构体、结构体指针类型会遍历获取字段指来迭代hash值)
3.如果map处于扩容状态(h.oldbuckets!=nil),就根据增量扩容或者等量扩容重新定位key的所处桶(新桶或者老桶)
4.遍历当前桶后边的所有溢出桶链表,找到与key相等的值并返回value

写入kv

func reflect_mapassign(t *maptype, h *hmap, key unsafe.Pointer, elem unsafe.Pointer) 
	p := mapassign(t, h, key)
	typedmemmove(t.elem, p, elem)

// Like mapaccess, but allocates a slot for the key if it is not present in the map.
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer 
	if h == nil 
		panic(plainError("assignment to entry in nil map"))
	
	if raceenabled 
		callerpc := getcallerpc()
		pc := abi.FuncPCABIInternal(mapassign)
		racewritepc(unsafe.Pointer(h), callerpc, pc)
		raceReadObjectPC(t.key, key, callerpc, pc)
	
	if msanenabled 
		msanread(key, t.key.size)
	
	if asanenabled 
		asanread(key, t.key.size)
	
	if h.flags&hashWriting != 0 
		fatal("concurrent map writes")
	
	hash := t.hasher(key, uintptr(h.hash0))

	// Set hashWriting after calling t.hasher, since t.hasher may panic,
	// in which case we have not actually done a write.
	h.flags ^= hashWriting

	if h.buckets == nil 
		h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
	

again:
	bucket := hash & bucketMask(h.B)
	if h.growing() 
		growWork(t, h, bucket)
	
	b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
	top := tophash(hash)

	var inserti *uint8
	var insertk unsafe.Pointer
	var elem unsafe.Pointer
bucketloop:
	for 
		for i := uintptr(0); i < bucketCnt; i++ 
			if b.tophash[i] != top 
				if isEmpty(b.tophash[i]) && inserti == nil 
					inserti = &b.tophash[i]
					insert

#yyds干货盘点# Map - LinkedHashSet&Map源码解析

Map - LinkedHashSet&Map源码解析

事实上LinkedHashMap是HashMap的直接子类,二者唯一的区别是LinkedHashMap在HashMap的基础上,采用双向链表(doubly-linked list)的形式将所有entry​连接起来,这样是为保证元素的迭代顺序跟插入顺序相同。上图给出了LinkedHashMap的结构图,主体部分跟HashMap完全一样,多了header​指向双向链表的头部(是一个哑元),该双向链表的迭代顺序就是​entry的插入顺序。

除了可以保迭代历顺序,这种结构还有一个好处 : 迭代LinkedHashMap时不需要像HashMap那样遍历整个​table​,而只需要直接遍历​header​指向的双向链表即可,也就是说LinkedHashMap的迭代时间就只跟entry​的个数相关,而跟table的大小无关。

有两个参数可以影响LinkedHashMap的性能: 初始容量(inital capacity)和负载系数(load factor)。初始容量指定了初始table​的大小,负载系数用来指定自动扩容的临界值。当entry​的数量超过capacity*load_factor时,容器将自动扩容并重新哈希。对于插入元素较多的场景,将初始容量设大可以减少重新哈希的次数。

将对象放入到LinkedHashMap或LinkedHashSet中时,有两个方法需要特别关心: hashCode()​和equals()​。hashCode()​方法决定了对象会被放到哪个​bucket​里,当多个对象的哈希值冲突时,​equals()​方法决定了这些对象是否是“同一个对象”。所以,如果要将自定义的对象放入到LinkedHashMap​或LinkedHashSet​中,需要@Override hashCode()​和equals()方法。


方法剖析

get()

get(Object key)方法根据指定的key值返回对应的value。该方法跟HashMap.get()方法的流程几乎完全一样,读者可自行参考前文 (opens new window),这里不再赘述。

put()

put(K key, V value)方法是将指定的key, value对添加到map里。该方法首先会对map做一次查找,看是否包含该元组,如果已经包含则直接返回,查找过程类似于get()方法;如果没有找到,则会通过addEntry(int hash, K key, V value, int bucketIndex)方法插入新的entry。

// LinkedHashMap.addEntry()
void addEntry(int hash, K key, V value, int bucketIndex)
if ((size >= threshold) && (null != table[bucketIndex]))
resize(2 * table.length);// 自动扩容,并重新哈希
hash = (null != key) ? hash(key) : 0;
bucketIndex = hash & (table.length-1);// hash%table.length

// 1.在冲突链表头部插入新的entry
HashMap.Entry<K,V> old = table[bucketIndex];
Entry<K,V> e = new Entry<>(hash, key, value, old);
table[bucketIndex] = e;
// 2.在双向链表的尾部插入新的entry
e.addBefore(header);
size++;
// LinkedHashMap.Entry.addBefor(),将this插入到existingEntry的前面
private void addBefore(Entry<K,V> existingEntry)
after = existingEntry;
before = existingEntry.before;
before.after = this;
after.before = this;


remove()

remove(Object key)的作用是删除key值对应的entry,该方法的具体逻辑是在removeEntryForKey(Object key)里实现的。removeEntryForKey()方法会首先找到key值对应的entry,然后删除该entry(修改链表的相应引用)。查找过程跟get()方法类似。

// LinkedHashMap.removeEntryForKey(),删除key值对应的entry
final Entry<K,V> removeEntryForKey(Object key)
......
int hash = (key == null) ? 0 : hash(key);
int i = indexFor(hash, table.length);// hash&(table.length-1)
Entry<K,V> prev = table[i];// 得到冲突链表
Entry<K,V> e = prev;
while (e != null) // 遍历冲突链表
Entry<K,V> next = e.next;
Object k;
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k)))) // 找到要删除的entry
modCount++; size--;
// 1. 将e从对应bucket的冲突链表中删除
if (prev == e) table[i] = next;
else prev.next = next;
// 2. 将e从双向链表中删除
e.before.after = e.after;
e.after.before = e.before;
return e;

prev = e; e = next;

return e;

LinkedHashSet

前面已经说过LinkedHashSet是对LinkedHashMap的简单包装,对LinkedHashSet的函数调用都会转换成合适的LinkedHashMap方法。

public class LinkedHashSet<E>
extends HashSet<E>
implements Set<E>, Cloneable, java.io.Serializable
......
// LinkedHashSet里面有一个LinkedHashMap
public LinkedHashSet(int initialCapacity, float loadFactor)
map = new LinkedHashMap<>(initialCapacity, loadFactor);

......
public boolean add(E e) //简单的方法转换
return map.put(e, PRESENT)==null;

......

以上是关于map源码解析的主要内容,如果未能解决你的问题,请参考以下文章

go语言中像这样的map如何解析成struct?

Go源码阅读——map.go

golang map 从源码分析实现原理(go 1.14)

图解Go里面的sync.Map了解编程语言核心实现源码

并发安全的 map:sync.Map源码分析

sync.Map源码分析