HashMap核心代码解析

Posted 2022-01-29 水田如雅

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了HashMap核心代码解析相关的知识，希望对你有一定的参考价值。

1，基本结构图解

HashMap首先是一个数组，数组的每个位置，挂的是链表或者红黑树（jdk 1.8及其以上）；在jdk 1.8之前，HashMap的实现仅仅是在数组的每个位置上挂链表；

只用链表的弊端：可能以为位置上hash冲突过多，造成去定位时候，查找效率比较低，ps,链表的查找时间复杂度为O(n),红黑树查找的复杂度为O(logn);

2，如何去定位在数组中的位置

首先先来看HashMap是如何把数据打散的：

static final int hash(Object key) 
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);

拿到一个key的值，如果是null,直接返回默认0；
如果是其他，拿到key的hashcode,完了把hashcode的高16位和低16位异或。
ps,异或操作是均匀的0和1的位运算，这步操作每一位上取到0和1的概率一样，所以算是均匀的打散；

把key打散之后，直接取模获取数组的index：

i = (n - 1) & hash

这里没有进行除取余，因为位运算效率比较高；另外，n一般取的是2的整数倍；

3,如何放置

/**
 * Implements Map.put and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) 
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else 
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else 
            for (int binCount = 0; ; ++binCount) 
                if ((e = p.next) == null) 
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            
        
        if (e != null)  // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        
    
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;

3.1 初始化hashmap

先来分段解析下：

if ((tab = table) == null || (n = tab.length) == 0)
    n = (tab = resize()).length;

首先，如果还没初始化，table,table也就是那个数组:

/**
 * The table, initialized on first use, and resized as
 * necessary. When allocated, length is always a power of two.
 * (We also tolerate length zero in some operations to allow
 * bootstrapping mechanics that are currently not needed.)
 */
transient Node<K,V>[] table;

先去走初始化数组，初始化这块也在resize里面，先不管别的，光看初始化相关代码：

else                // zero initial threshold signifies using defaults
    newCap = DEFAULT_INITIAL_CAPACITY;
    newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);

其实就是赋值个初始化的容量，还有一个threshold的值；

3.2 放置，不存在hash冲突的时候

if ((p = tab[i = (n - 1) & hash]) == null)
    tab[i] = newNode(hash, key, value, null);

当table里面计算出来的index不能在hash冲突的时候，直接new一个结点，放在这个位上。

而且这时候，创建的结点不是树的结点，就只是个链表的普通结点：

// Create a regular (non-tree) node
Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) 
    return new Node<>(hash, key, value, next);

3.3 放置，存在hash冲突的时候-就是头结点

Node<K,V> e; K k;
if (p.hash == hash &&
    ((k = p.key) == key || (key != null && key.equals(k))))
    e = p;

对比头结点的hash和key发现要写入的值就是数组槽里面的头结点，直接把这个头结点赋值给e,这里的e代表我们这次要放入的结点。

3.4放置，存在hash冲突的时候-当前头结点是个红黑树的结点

else if (p instanceof TreeNode)
    e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);

拿到的是树的结点，放树上；

3.5放置，存在hash冲突的时候-当前头结点是是个链表

先来看整段的吧：

else 
    for (int binCount = 0; ; ++binCount) 
        if ((e = p.next) == null) 
            p.next = newNode(hash, key, value, null);
            if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                treeifyBin(tab, hash);
            break;
        
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            break;
        p = e;
    

if (e != null)  // existing mapping for key
    V oldValue = e.value;
    if (!onlyIfAbsent || oldValue == null)
        e.value = value;
    afterNodeAccess(e);
    return oldValue;

先来看binCount：

if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
    treeifyBin(tab, hash);
break;

hashmap里面：

/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon
 * shrinkage.
 */
static final int TREEIFY_THRESHOLD = 8;

链表调整为树的临界值是8，当现在链表里面有超过7个元素时候，就先去调整，

if ((e = p.next) == null) 
    p.next = newNode(hash, key, value, null);
    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
        treeifyBin(tab, hash);
    break;

此时 p代表的是当前结点，如果当前结点没有next,也就是下面没链接着别的结点，直接放在p后面；
注意，与jdk 1.8之前相比，这里是链表的尾插法，因为每次肯定是要去判断链表长度，所以会从链表头结点找到尾部结点，插入尾部去；

当要插入的key在链表中找到时候：

if (e.hash == hash &&
    ((k = e.key) == key || (key != null && key.equals(k))))
    break;

直接跳出循环，不找了；

如果是在链表里面，存在hash值和key相等的，就去把值修改下:

if (e != null)  // existing mapping for key
    V oldValue = e.value;
    if (!onlyIfAbsent || oldValue == null)
        e.value = value;
    afterNodeAccess(e);
    return oldValue;

整个写入操作完成之后，还要去：

++modCount;
if (++size > threshold)
    resize();
afterNodeInsertion(evict);
return null;

modCount：表示被修改过多少次；

另外，如果实际长度大于所能负载的个数，去做一个resize的操作。

4, resize如何进行的

在初始化，和每次插入新元素之后，都会去执行resize操作。

4.1 当前数组已经被初始化完成-扩容两倍

Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) 
    if (oldCap >= MAXIMUM_CAPACITY) 
        threshold = Integer.MAX_VALUE;
        return oldTab;
    
    else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
             oldCap >= DEFAULT_INITIAL_CAPACITY)
        newThr = oldThr << 1; // double threshold

首先，如果之前数组的长度大于0，先去判断是否数组长度已经扩容到最大容量：

最大容量为2的30次：

static final int MAXIMUM_CAPACITY = 1 << 30;

当前是最大容量，就不扩容了。

没达到最大容量，新容量扩容两倍，同时，负载容量也扩容两倍。

4.2 使用有参数的构造函数构造的hashmap

public HashMap(int initialCapacity, float loadFactor) 
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);

当自定义了初始容量和负载因子，

先来看：

static final int tableSizeFor(int cap) 
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;

这个过程就是把cap-1之后，拿到最高位上的1，其实还是为了保证初始容量是你2的几次方；之后这个初始容量，会保存在 threshold里面；

所以，当我们去resize时候，才会有：

else if (oldThr > 0) // initial capacity was placed in threshold
    newCap = oldThr;

如果没定义初始容量啥的，这里走个默认的：

else                // zero initial threshold signifies using defaults
    newCap = DEFAULT_INITIAL_CAPACITY;
    newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);

然后是对负载容量的一些赋值：

if (newThr == 0) 
    float ft = (float)newCap * loadFactor;
    newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
              (int)ft : Integer.MAX_VALUE);

threshold = newThr;

之后就比较粗暴：

4.3 核心调整过程

@SuppressWarnings("rawtypes","unchecked")
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;

table就直接变成了一个刚声明的数组；

下面是核心的调整过程：

for (int j = 0; j < oldCap; ++j) 
        Node<K,V> e;
        if ((e = oldTab[j]) != null) 
            oldTab[j] = null;
            if (e.next == null)
                newTab[e.hash & (newCap - 1)] = e;
            else if (e instanceof TreeNode)
                ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
            else  // preserve order
                Node<K,V> loHead = null, loTail = null;
                Node<K,V> hiHead = null, hiTail = null;
                Node<K,V> next;
                do 
                    next = e.next;
                    if ((e.hash & oldCap) == 0) 
                        if (loTail == null)
                            loHead = e;
                        else
                            loTail.next = e;
                        loTail = e;
                    
                    else 
                        if (hiTail == null)
                            hiHead = e;
                        else
                            hiTail.next = e;
                        hiTail = e;
                    
                 while ((e = next) != null);
                if (loTail != null) 
                    loTail.next = null;
                    newTab[j] = loHead;
                
                if (hiTail != null) 
                    hiTail.next = null;
                    newTab[j + oldCap] = hiHead;

还是分段来看吧。

老数组只有一个结点，直接扔到新数组:

if (e.next == null)
    newTab[e.hash & (newCap - 1)] = e;

新数组的下标为：e.hash & (newCap - 1)。这里是不用去计算老的hash值的，因为都在node里面存着；

往下走，如果是树的结点：

else if (e instanceof TreeNode)
    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);

走个分裂的方法:

 spit()
 方法的作用是将旧数组转移到新数组
 
 split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit)
 方法的四个参数分别是：
               当前hashMap对象、新数组、正在遍历的旧数组下标、旧数组的长度

final void split(HashMap<K,V> map, Node<K,V以上是关于HashMap核心代码解析的主要内容，如果未能解决你的问题，请参考以下文章