java 集合框架-HashMap

Posted 2022-11-22 智公博客

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了java 集合框架-HashMap相关的知识，希望对你有一定的参考价值。

一、概述

HashMap应该是我们最常会使用到的Map容器了吧，这里先简单罗列写它的特点：
1.实现所有Map接口方法，允许null值key、null值value
2.不保证元素的顺序，包括先后添加的顺序，特别是随着容量增加顺序不能保证
3.实现等价于HashTable，区别是HashMap是非线程安全的
4.假设hash方法可以正确的分散，基本的get、put操作的时间复杂度为常量时间（N）
5.迭代器迭代时间与容量正比例，常用迭代操作的初始化容量不要大，负载因子不要太小
6.主要的两个参数：初始容量和负载因子；当映射大小达到容量和因子的乘积大小后自动扩大容量，对全部映射重新hash,容量大概变为原来的2倍；需要衡量好初始容量和因子的大小，默认因子的大小为0.75，因子大可以增加空间使用率、减少重新hash次数，但是影响get，put的性能
7.非线程安全，可以使用Collection.synchronizedMap或者Collections.synchronizedMap包装为线程安全Map
8.与ArrayList相似，HashMap在发现并发时一样是有‘快速失败’机制，在获取迭代器后，如果数据结构发送改变，将会抛出ConcurrentModificationException异常

二、源码分析

1、变量

由于后面的方法中源码常出现HashMap的变量，这里先列出来并说明作用

//默认初始化大小为16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;

最大容量大小为2的30次方：1073741824
static final int MAXIMUM_CAPACITY = 1 << 30;

//默认负载因子大小为0.75
static final float DEFAULT_LOAD_FACTOR = 0.75f;

//空数组
static final Entry<?,?>[] EMPTY_TABLE = ;

//存储键值对映射的Entry数组，默认为空数组，大小必须为2的次幂
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

//保存HashMap实例映射的数量
transient int size;

//当前大小的上限阀值，达到这个值后需要扩容(resize)，该值应为容量与负载因子的乘积
//当table还是空数组情况下，这个值是第一次初始化的大小
int threshold;

//负载因子
final float loadFactor;

//可理解为修改次数，每次map结构修改都会加1，用于检测并发快速失败
transient int modCount;

//String类型key默认的可选threshold，当threshold大于这个值时，启用可选hash算法，可以减少由于弱hash值计算方法导致的hash冲突，这个值可以通过系统变量：jdk.map.althashing.threshold设置
static final int ALTERNATIVE_HASHING_THRESHOLD_DEFAULT = Integer.MAX_VALUE;

下面的代码就是为了从jdk.map.althashing.threshold获取这个ALTERNATIVE_HASHING_THRESHOLD；
这段’静态’代码作用很简单，就是获取这个ALTERNATIVE_HASHING_THRESHOLD_DEFAULT值，当设置为-1时表示不启用改可选hash方法，值还是Integer.MAX_VALUE；

        static final int ALTERNATIVE_HASHING_THRESHOLD;

        static 
            String altThreshold = java.security.AccessController.doPrivileged(
                new sun.security.action.GetPropertyAction(
                    "jdk.map.althashing.threshold"));

            int threshold;
            try 
                threshold = (null != altThreshold)
                        ? Integer.parseInt(altThreshold)
                        : ALTERNATIVE_HASHING_THRESHOLD_DEFAULT;

                // disable alternative hashing if -1
                if (threshold == -1) 
                    threshold = Integer.MAX_VALUE;
                

                if (threshold < 0) 
                    throw new IllegalArgumentException("value must be positive integer.");
                
             catch(IllegalArgumentException failed) 
                throw new Error("Illegal value for 'jdk.map.althashing.threshold'", failed);
            

            ALTERNATIVE_HASHING_THRESHOLD = threshold;

//hash种子参数，随机值，每个实例可能不一样在，计算hash值时使用，用于减少hash碰撞(hash值冲突)，为0时表示可选hash不启用
transient int hashSeed = 0;

2、方法

    public HashMap(int initialCapacity, float loadFactor) 
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        this.loadFactor = loadFactor;
        threshold = initialCapacity;
        init();
    

    void init()

使用构造方法创建一个HashMap对象后，实际table还是一个空数组，init方法是空实现，并没有进行实际的初始化操作；这里有一点有注意的就是 threshold 等于指定的初始化大小而不是initialCapacity与loadFactor的乘积，当table还是空数组时，第一次初始化到threshold大小就是initialCapacity；

    private static int roundUpToPowerOf2(int number) 
        // assert number >= 0 : "number must be non-negative";
        return number >= MAXIMUM_CAPACITY
                ? MAXIMUM_CAPACITY
                : (number > 1) ? Integer.highestOneBit((number - 1) << 1) : 1;

这个静态内部方法，用于计算入参number的处于的下一个2次幂结果，这样说有点难理解，还是解析代码：当number大于最大容量值，返回最大值；当numer小于等于1时，返回1；其他情况返回(number-1)最高位代表的值乘2。也即是获取2次幂中第一个大于number的值；所以这个方法的返回值肯定是2次幂的数；

下面我们来看下核心的put方法，涉及容量的扩展、hash计算

    public V put(K key, V value) 
        if (table == EMPTY_TABLE) 
            inflateTable(threshold);
        
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) 
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) 
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            
        

        modCount++;
        addEntry(hash, key, value, i);
        return null;

当表table还是空（刚new出来的实例）会进行一次扩容，每次扩容大小为threshold,这里第一次threshold的值是initialCapacity；

从第二段代码可以看出，key值是允许为null的，并且是单独出来，后面我们再看具体处理实现；
接着就是hash计算以及put存储和碰撞处理，对于以及存在的key，会进行值的替换并返回旧值；
最后如果是新key值，将该key-value保存到table中；

下面我们来看put方法中所调用到方法的具体实现：

    private void inflateTable(int toSize) 
        // Find a power of 2 >= toSize
        int capacity = roundUpToPowerOf2(toSize);

        threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
        table = new Entry[capacity];
        initHashSeedAsNeeded(capacity);

实际这个方法可以看成是初始化的方法，因为第一次table对象创建只在这个方法里面；roundUpToPowerOf2方法获取下一个2次幂数，threshold设置为下一次需要扩容的目标大小(toSize)，创建完table数组对象后，初始化hashSeed:

    final boolean initHashSeedAsNeeded(int capacity) 
        boolean currentAltHashing = hashSeed != 0;
        boolean useAltHashing = sun.misc.VM.isBooted() &&
                (capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
        boolean switching = currentAltHashing ^ useAltHashing;
        if (switching) 
            hashSeed = useAltHashing
                ? sun.misc.Hashing.randomHashSeed(this)
                : 0;
        
        return switching;

只有当capacity大于ALTERNATIVE_HASHING_THRESHOLD且还没有使用过可选hash因子时，才会随机一个值作为hash因子，否则都是0；这个方法只会在初始化（相当于）inflateTable方法，和扩容方法 resize 才调用；当map的容量大于某个值时切换使用一个随机的hash因子主要为了减少hash值的重复；

下面具体看看null值的key处理：

    private V putForNullKey(V value) 
        for (Entry<K,V> e = table[0]; e != null; e = e.next) 
            if (e.key == null) 
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            
        
        modCount++;
        addEntry(0, null, value, 0);
        return null;

可以看出null处理和put方法结果差不多，主要区别是null的hash值固定为0，当然也不会存在不同key重复的hash值，所以索引index位置也不需要计算，也是0；

计算hash值的方法：

    final int hash(Object k) 
        int h = hashSeed;
        if (0 != h && k instanceof String) 
            return sun.misc.Hashing.stringHash32((String) k);
        

        h ^= k.hashCode();

        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);

首先判断了是否启用了可选的hash因子，如果启用了并且key是String类型，则使用特殊底层方法计算hash值；下面的具体的hash值算法，使用了很多的位移和异或操作，我们这里只需要知道它的目的就是减少不同key的hash值等到相同值的发生，不深入去理解这算法设计为什么如此；

    static int indexFor(int h, int length) 
        // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
        return h & (length-1);

索引index方法，是得到hash值后，计算key对应的hash值应该放在哪个table的那个位置（index）;对于这种需求：在一个定长的数组计算位置、位置值范围为0~~length-1、且要尽可能均匀分散，我们一把最常使用的方法就是取模运算；这里使用一个与运算的巧妙之处是与length有关的，table的length总是为大于0的2的次幂数，所以length-1的值总是为 $2^x$ ，二进制就是(11…11)，与运算后得到的结果肯定小于length-1，且能均分分布；加上前面的hash计算算法，能有效的减少hash冲突碰撞；

接下来有必要先看看键值对Entry的实现，然后才继续往下看addEntry方法：

    static class Entry<K,V> implements Map.Entry<K,V> 
        final K key;
        V value;
        Entry<K,V> next;
        int hash;

        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) 
            value = v;
            next = n;
            key = k;
            hash = h;
        

        public final K getKey() 
            return key;
        

        public final V getValue() 
            return value;
        

        public final V setValue(V newValue) 
            V oldValue = value;
            value = newValue;
            return oldValue;
        

        public final boolean equals(Object o) 
            if (!(o instanceof Map.Entry))
                return false;
            Map.Entry e = (Map.Entry)o;
            Object k1 = getKey();
            Object k2 = e.getKey();
            if (k1 == k2 || (k1 != null && k1.equals(k2))) 
                Object v1 = getValue();
                Object v2 = e.getValue();
                if (v1 == v2 || (v1 != null && v1.equals(v2)))
                    return true;
            
            return false;
        

        public final int hashCode() 
            return Objects.hashCode(getKey()) ^ Objects.hashCode(getValue());
        

        public final String toString() 
            return getKey() + "=" + getValue();
        

        /**
         * This method is invoked whenever the value in an entry is
         * overwritten by an invocation of put(k,v) for a key k that's already
         * in the HashMap.
         */
        void recordAccess(HashMap<K,V> m) 
        

        /**
         * This method is invoked whenever the entry is
         * removed from the table.
         */
        void recordRemoval(HashMap<K,V> m)

Entry实现也简单易懂，记录每个键值对key-value，以及key值的hash值，重要的是有一个next变量用于存储hash冲突（key值hash 值相同）的情况，当有碰撞冲突是，next就相当于一个链表，后面会看到如何存储和查找；equal的逻辑是key和value都相等才任务entry相等，这里还有两个接口方法是空实现的：recordAccess 方法在值替换的时候会被调用，recordRemoval 方法在值被移除的时候会被调用；

接下来继续看 addEntry 方法：

    void addEntry(int hash, K key, V value, int bucketIndex) 
        if ((size >= threshold) && (null != table[bucketIndex])) 
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        

        createEntry(hash, key, value, bucketIndex);

每次增加Entry都先判断是否需要扩容：当前大小size是否大于等于阀值且本次需要存储的index在table非空；所以size达到阀值并不是扩容的唯一条件，如果index的位置还是空，还可以继续存储不扩容，resize扩容方法，扩容目标直接是当前table长度的2倍，完成扩容后需要重新计算当前需要存储key的hash值和index；

    void createEntry(int hash, K key, V value, int bucketIndex) 
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;

createEntry 方法是很简单链表操作，这里就可以看出Entry实现中next是如何使用的了；

    void resize(int newCapacity) 
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) 
            threshold = Integer.MAX_VALUE;
            return;
        

        Entry[] newTable = new Entry[newCapacity];
        transfer(newTable, initHashSeedAsNeeded(newCapacity));
        table = newTable;
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);

resize 扩容方法只会由put和putAll方法触发调用到，用于当容量达到阀值时增加table大小；首先判断大小不能超过MAXIMUM_CAPACITY，然后新创建一个数组table，并且判断是否需要重新hash:没有切换hash因子 hashSeed就不需要重新hash，应该已存在的key 的hash值不会改变；

    void transfer(Entry[] newTable, boolean rehash) 
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) 
            while(null != e) 
                Entry<K,V> next = e.next;
                if (rehash) 
                    e.hash = null == e.key ? 0 : hash(e.key);
                
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;

transfer 方法将原来table的entry转换到newTable中；无论是否重hash，index都是需要重新索引的，因为index值是与table长度有关，所以创建新table后长度改变需要重新索引；后面的就是链表操作了；

到这里，添加键值对元素的相关主要方法差不多列举完了，下面继续看看获取的方法：

    public V get(Object key) 
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();

    final Entry<K,V> getEntry(Object key) 
        if (size == 0) 
            return null;
        

        int hash = (key == null) ? 0 : hash(key);
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) 
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        
        return null;

与put方法类似，get的时候先计算hash值，然后计算索引值index，然后在链表做顺序判断key值相等的entry

最后我们来看下HashMap是如何实现迭代器的；Map的元素单位是key-value键值对，即是Entry，Map的迭代器其实就是Set的迭代器，Map不仅提供EntrySet，而且还区分开了：KeySet、valueSet，内部实现了的迭代器有：EntryIterator，KeyIterator，ValueIterator，而这个三个迭代器都继承自一个基类：HashIterator:

    private abstract class HashIterator<E> implements Iterator<E> 
        Entry<K,V> next;        // next entry to return
        int expectedModCount;   // For fast-fail
        int index;              // current slot
        Entry<K,V> current;     // current entry

        HashIterator() 
            expectedModCount = modCount;
            if (size > 0)  // advance to first entry
                Entry[] t = table;
                while (index < t.length && (next = t[index++]) == null)
                    ;
            
        

        public final boolean hasNext() 
            return next != null;
        

        final Entry<K,V> nextEntry() 
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            Entry<K,V> e = next;
            if (e == null)
                throw new NoSuchElementException();

            if ((next = e.next) == null) 
                Entry[] t = table;
                while (index < t.length && (next = t[index++]) == null)
                    ;
            
            current = e;
            return e;
        

        public void remove() 
            if (current == null)
                throw new IllegalStateException();
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            Object k = current.key;
            current = null;
            HashMap.this.removeEntryForKey(k);
            expectedModCount = modCount;

HashIterat 有4个私有变量：
next：迭代器实例下一个值
current：当前值
expectedModCount：修改值，并发修改快速失败机制
index：next值在table的索引值
HashIterat 只有一个无参构造函数，创建实例是初始化私有变量：记录expectedModCount 值，遍历确认next值和index索引，默认current为null；
与其他集合类相同，nextEntry、remove 等方法先判断ModCount值，确认是否在迭代器创建后还发生过修改；nextEntry 方法遍历找下一个值next,并设置current值为原next值，放回current；remove就是直接从current获取到key直接移除这个key的键值对；

其他三个迭代器EntryIterator，KeyIterator，ValueIterator，都是只重写next方法，简单返回值：

    private final class ValueIterator extends HashIterator<V> 
        public V next() 
            return nextEntry().value;
        
    

    private final class KeyIterator extends HashIterator<K> 
        public K next() 
            return nextEntry().getKey();
        
    

    private final class EntryIterator extends HashIterator<Map.Entry<K,V>> 
        public Map.Entry<K,V> next() 
            return nextEntry();

对HashMap的源码分析从变量、保存、获取、迭代这几个方法解析了HashMap的实现，当然源码不止这一点，其他的方法不再一一列出解析；

以上是关于java 集合框架-HashMap的主要内容，如果未能解决你的问题，请参考以下文章

Java集合框架中的Hashtable、HashMap、HashSet、哈希表概念

java集合框架 hashMap 简单使用以及深度分析(转)

Java中最常用的集合类框架之 HashMap

java源码分析之集合框架HashMap 10

java 集合框架-HashMap