HashMap核心代码解析

Posted 水田如雅

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了HashMap核心代码解析相关的知识,希望对你有一定的参考价值。

1,基本结构图解

HashMap首先是一个 数组,数组的每个位置,挂的是链表或者红黑树 (jdk 1.8及其以上);在jdk 1.8之前,HashMap的实现仅仅是在数组的每个位置上 挂链表;

只用链表的弊端:可能以为位置上hash冲突过多,造成去定位时候,查找效率比较低,ps,链表的查找时间复杂度为O(n),红黑树查找的复杂度为O(logn);

2,如何去定位在数组中的位置

首先先来看HashMap是如何把数据打散的:

首先先来看HashMap是如何把数据打散的:

static final int hash(Object key) 
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);

拿到一个key的值,如果是null,直接返回默认0;
如果是其他,拿到key的hashcode,完了把hashcode的高16位和低16位异或。
ps,异或操作是均匀的0和1的位运算,这步操作每一位上取到0和1的概率一样,所以算是均匀的打散;

把key打散之后,直接取模获取数组的index:

i = (n - 1) & hash

这里没有进行除取余,因为位运算效率比较高;另外,n一般取的是2的整数倍;

3,如何放置

/**
 * Implements Map.put and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) 
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else 
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else 
            for (int binCount = 0; ; ++binCount) 
                if ((e = p.next) == null) 
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            
        
        if (e != null)  // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        
    
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;

3.1 初始化hashmap

先来分段解析下:

if ((tab = table) == null || (n = tab.length) == 0)
    n = (tab = resize()).length;

首先,如果还没初始化,table,table也就是那个数组:

/**
 * The table, initialized on first use, and resized as
 * necessary. When allocated, length is always a power of two.
 * (We also tolerate length zero in some operations to allow
 * bootstrapping mechanics that are currently not needed.)
 */
transient Node<K,V>[] table;

先去走初始化 数组,初始化这块也在resize里面,先不管别 的,光看初始化相关代码:

else                // zero initial threshold signifies using defaults
    newCap = DEFAULT_INITIAL_CAPACITY;
    newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);

其实就是赋值个初始化的容量,还有一个threshold的值;

3.2 放置,不存在hash冲突的时候

if ((p = tab[i = (n - 1) & hash]) == null)
    tab[i] = newNode(hash, key, value, null);

当table里面计算出来的index不能在hash冲突的时候,直接new一个结点,放在这个位上。

而且这时候,创建的结点不是树的结点,就只是个链表的普通结点:

// Create a regular (non-tree) node
Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) 
    return new Node<>(hash, key, value, next);

3.3 放置,存在hash冲突的时候-就是头结点

Node<K,V> e; K k;
if (p.hash == hash &&
    ((k = p.key) == key || (key != null && key.equals(k))))
    e = p;

对比头结点的hash和key发现要写入的值就是数组槽里面的头结点,直接把这个头结点赋值给e,这里的e代表我们这次要放入的结点。

3.4放置,存在hash冲突的时候-当前头结点是个红黑树的结点

else if (p instanceof TreeNode)
    e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);

拿到的是树的结点,放树上;

3.5放置,存在hash冲突的时候-当前头结点是是个链表

先来看整段的吧:

else 
    for (int binCount = 0; ; ++binCount) 
        if ((e = p.next) == null) 
            p.next = newNode(hash, key, value, null);
            if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                treeifyBin(tab, hash);
            break;
        
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            break;
        p = e;
    

if (e != null)  // existing mapping for key
    V oldValue = e.value;
    if (!onlyIfAbsent || oldValue == null)
        e.value = value;
    afterNodeAccess(e);
    return oldValue;

先来看binCount:

if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
    treeifyBin(tab, hash);
break;

hashmap里面:

/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon
 * shrinkage.
 */
static final int TREEIFY_THRESHOLD = 8;

链表调整为树的临界值是8,当现在链表里面有超过7个元素时候,就先去调整,

if ((e = p.next) == null) 
    p.next = newNode(hash, key, value, null);
    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
        treeifyBin(tab, hash);
    break;

此时 p代表的是当前结点 ,如果当前结点没有next,也就是下面没链接着别的结点,直接放在p后面;
注意,与jdk 1.8之前相比,这里是链表的尾插法,因为每次肯定是要去判断链表长度,所以会从链表头结点找到尾部结点,插入 尾部去;

当要插入的key在链表中找到时候:

if (e.hash == hash &&
    ((k = e.key) == key || (key != null && key.equals(k))))
    break;

直接跳出循环,不找了;

如果是在链表里面,存在hash值和key相等的,就去把值修改下:

if (e != null)  // existing mapping for key
    V oldValue = e.value;
    if (!onlyIfAbsent || oldValue == null)
        e.value = value;
    afterNodeAccess(e);
    return oldValue;

整个写入操作完成之后 ,还要去:

++modCount;
if (++size > threshold)
    resize();
afterNodeInsertion(evict);
return null;

modCount:表示被修改过多少次;

另外,如果实际长度 大于所能负载 的个数,去做一个resize的操作。

4, resize如何进行的

在初始化,和每次插入新元素之后,都会去执行resize操作。

4.1 当前数组已经被初始化完成-扩容两倍

Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) 
    if (oldCap >= MAXIMUM_CAPACITY) 
        threshold = Integer.MAX_VALUE;
        return oldTab;
    
    else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
             oldCap >= DEFAULT_INITIAL_CAPACITY)
        newThr = oldThr << 1; // double threshold

首先,如果之前数组的长度大于0,先去判断是否数组长度已经扩容到最大容量:

最大容量为2的30次:

static final int MAXIMUM_CAPACITY = 1 << 30;

当前是最大容量,就不扩容了。

没达到最大容量,新容量扩容两倍,同时,负载容量也扩容两倍。

4.2 使用有参数的构造函数构造的hashmap

public HashMap(int initialCapacity, float loadFactor) 
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);

当自定义了初始容量和负载因子,

先来看:

static final int tableSizeFor(int cap) 
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;

这个过程就是把cap-1之后,拿到最高位上的1,其实还是为了保证初始容量是你2的几次方;之后 这个初始容量,会保存在 threshold里面;

所以,当我们去resize时候,才会有:

else if (oldThr > 0) // initial capacity was placed in threshold
    newCap = oldThr;

如果没定义初始容量啥 的,这里走个默认的:

else                // zero initial threshold signifies using defaults
    newCap = DEFAULT_INITIAL_CAPACITY;
    newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);

然后是对负载容量的一些赋值:

if (newThr == 0) 
    float ft = (float)newCap * loadFactor;
    newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
              (int)ft : Integer.MAX_VALUE);

threshold = newThr;

之后就比较粗暴:

4.3 核心调整过程

@SuppressWarnings("rawtypes","unchecked")
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;

table就直接变成了一个刚声明的数组;

下面是核心的调整过程:

for (int j = 0; j < oldCap; ++j) 
        Node<K,V> e;
        if ((e = oldTab[j]) != null) 
            oldTab[j] = null;
            if (e.next == null)
                newTab[e.hash & (newCap - 1)] = e;
            else if (e instanceof TreeNode)
                ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
            else  // preserve order
                Node<K,V> loHead = null, loTail = null;
                Node<K,V> hiHead = null, hiTail = null;
                Node<K,V> next;
                do 
                    next = e.next;
                    if ((e.hash & oldCap) == 0) 
                        if (loTail == null)
                            loHead = e;
                        else
                            loTail.next = e;
                        loTail = e;
                    
                    else 
                        if (hiTail == null)
                            hiHead = e;
                        else
                            hiTail.next = e;
                        hiTail = e;
                    
                 while ((e = next) != null);
                if (loTail != null) 
                    loTail.next = null;
                    newTab[j] = loHead;
                
                if (hiTail != null) 
                    hiTail.next = null;
                    newTab[j + oldCap] = hiHead;
                
            
        
    

还是分段来看吧。

老数组只有一个结点,直接扔到新数组:

if (e.next == null)
    newTab[e.hash & (newCap - 1)] = e;

新数组的下标为:e.hash & (newCap - 1)。这里是不用去计算老的hash值的,因为都在node里面存着;

往下走,如果是树的结点:

else if (e instanceof TreeNode)
    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);

走个分裂的方法:

 spit()
 方法的作用是将旧数组转移到新数组
 
 split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit)
 方法的四个参数分别是:
               当前hashMap对象、新数组、正在遍历的旧数组下标、旧数组的长度
final void split(HashMap<K,V> map, Node<K,V>[] tab, C语言 对字节的高位和低位进行互换!

关于android 的10进制转16进制高位在前低位在后,不足位数补0处理方法

关于android 的10进制转16进制高位在前低位在后,不足位数补0处理方法

关于android 的10进制转16进制高位在前低位在后,不足位数补0处理方法

Go语言判断一个字节的高位大于四

Java 高位低位