hashmap & concurrentHashmap

Posted 2021-04-24 Qunar技术沙龙

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了hashmap & concurrentHashmap相关的知识，希望对你有一定的参考价值。

Hash Map

Hash Map是我们最常使用的数据结构之一，我们都知道在循环遍历Hash Map的时候同时remove map里面的数据会抛出Concurrent Modification Exception。

//错误代码		
Map<String, String> map = new HashMap<String, String>();
map.put("a","a");
map.put("s", "s");
for(String s : map.keySet()){
if(s.equals("s")){
    map.remove(s);
	}
}

原因：在对map进行遍历的时候，会生存一个KeyIterator，KeyIterator被创建之后会建立一个指向原来对象的单链索引表，当HashMap被修改时，map的size会减1，但是不会修改KeyIterator的大小，KeyIterator做遍历的时候，会检查HashMap的size，size发生变化，抛出错误Concurrent Modification Exception。

private final class KeyIterator extends HashIterator<K> {
  public K next() {
    return nextEntry().getKey();
    }
 }

         
final Entry<K,V> nextEntry() {
if (modCount != expectedModCount)//验证map的大小是否发生变化
 throw new ConcurrentModificationException();
   Entry<K,V> e = next;
if (e == null)
   throw new NoSuchElementException();
 
   
 if ((next = e.next) == null) {
    Entry[] t = table;
 while (index < t.length && (next = t[index++]) == null)
                    ;
            }
   current = e;
   return e;
        }

解决方案：

方法一：通过Iterator修改HashMap

//正确代码
Iterator<String> it = map.keySet().iterator();
while(it.hasNext()){
String s = (String) it.next();
if(s.equals("s")){
    it.remove();
	}
	 }

方法二：使用“ConcurrentHashMap”替换HashMap，ConcurrentHashMap会自己检查修改操作，对其加锁，也可针对插入操作。

ConcurrentHashMap

java.util.concurrent 包中的ConcurrentHashMap类是对Map的线程安全的实现，它提供了好得多的并发性。ConcurrentHashMap允许多个修改操作并发进行，其关键在于使用了锁分离技术。首先将数据分成一段一段的存储，然后给每一段数据配一把锁，当一个线程占用锁访问其中一个段数据的时候，其他段的数据也能被其他线程访问。

数据结构

ConcurrentHashMap是由Segment数组结构和HashEntry数组结构组成。Segment是一种可重入锁ReentrantLock，在ConcurrentHashMap里扮演锁的角色，HashEntry则用于存储键值对数据。一个ConcurrentHashMap里包含一个Segment数组，Segment的结构和HashMap类似，是一种数组和链表结构，一个Segment里包含一个HashEntry数组，每个HashEntry是一个链表结构的元素，每个Segment守护者一个HashEntry数组里的元素,当对HashEntry数组的数据进行修改时，必须首先获得它对应的Segment锁。

ConcurrentHashMap的get操作

public V get(Object key) {
  int hash = hash(key.hashCode());
  return segmentFor(hash).get(key, hash);
}
/**
* Returns the segment that should be used for key with given hash
* @param hash the hash code for the key
* @return the segment
*/    
final Segment<K,V> segmentFor(int hash) {
  return segments[(hash >>> segmentShift) & segmentMask];
}

Segment的get操作实现非常简单和高效。先经过一次再哈希，然后使用这个哈希值通过哈希运算定位到segment，再通过哈希算法定位到元素。

V get(Object key, int hash) {
 if (count != 0) { // read-volatile
   HashEntry<K,V> e = getFirst(hash);
    while (e != null) {
    if (e.hash == hash && key.equals(e.key)) {
      V v = e.value;
      if (v != null)
        return v;
        return readValueUnderLock(e); // recheck                    
        }
                    e = e.next;
                }
            }            
        return null;
        }

get操作不需要锁。第一步是访问count变量，这是一个volatile变量，由于所有的修改操作在进行结构修改时都会在最后一步写count变量，通过这种机制保证get操作能够得到几乎最新的结构更新。对于非结构更新，也就是结点值的改变，由于HashEntry的value变量是volatile的，也能保证读取到最新的值。接下来就是对hash链进行遍历找到要获取的结点，如果没有找到，直接访回null。

/**
* Segment<K,V>
* The number of elements in this segment's region.
*/
transient volatile int count;
//HashEntry<K,V>
volatile V value;

ConcurrentHashMap的remove操作

public V remove(Object key) {  
 hash = hash(key.hashCode());  
    return segmentFor(hash).remove(key, hash, null);  
}

整个操作是先定位到段，然后委托给段的remove操作。当多个删除操作并发进行时，只要它们所在的段不相同，它们就可以同时进行

/**
* Remove; match on key only if value null, else match both.
*/
V remove(Object key, int hash, Object value) {
  lock();
  try { 
    int c = count - 1;
    HashEntry<K,V>[] tab = table;
    int index = hash & (tab.length - 1);
    HashEntry<K,V> first = tab[index];
    HashEntry<K,V> e = first;
    while (e != null && (e.hash != hash || !key.equals(e.key)))
     e = e.next;
V oldValue = null;
  if (e != null) {
    V v = e.value; 
  if (value == null || value.equals(v)) {
    oldValue = v;
  // All entries following removed node can stay 
  // in list, but all preceding ones need to be 
  // cloned.
  ++modCount;
  HashEntry<K,V> newFirst = e.next;
  for (HashEntry<K,V> p = first; p != e; p = p.next)
    newFirst = new HashEntry<K,V>(p.key, p.hash,
                   newFirst, p.value);
  tab[index] = newFirst;
  count = c; // write-volatile 
  }
}
  return oldValue;

}finally {

  unlock();
 }
  }

整个操作是在持有段锁的情况下执行的，空白行之前的行主要是定位到要删除的节点e。接下来，如果不存在这个节点就直接返回null，否则就要将e前面的结点复制一遍，尾结点指向e的下一个结点。e后面的结点不需要复制，它们可以重用。
remove操作都会去修改count变量值

ConcurrentHashMap的put操作

public V put(K key, V value) {
  if (value == null)
    throw new NullPointerException();
  int hash = hash(key.hashCode());
  return segmentFor(hash).put(key, hash, value, false);
    }

ConcurrentHashMap不能存放为NULL的数据，这是和hashMap不一样的。

V put(K key, int hash, V value, boolean onlyIfAbsent) {
   lock();
   try { 
     int c = count;
     if (c++ > threshold) // ensure capacity 
      rehash();
        HashEntry<K,V>[] tab = table; 
        int index = hash & (tab.length - 1);
        HashEntry<K,V> first = tab[index];
        HashEntry<K,V> e = first;
        while (e != null && (e.hash != hash || !key.equals(e.key)))
          e = e.next;
V oldValue; 
if (e != null) { 
  oldValue = e.value;
 if (!onlyIfAbsent)
     e.value = value;
       }
 else {
   oldValue = null;
   ++modCount;
   tab[index] = new HashEntry<K,V>(key, hash, first, value);
   count = c; // write-volatile
 } 
 return oldValue;
   } finally {
    unlock();
      }
  }

该方法也是在持有段锁的情况下执行的，首先判断是否需要rehash，需要就先rehash。接着是找是否存在同样一个key的结点，如果存在就直接替换这个结点的值。否则创建一个新的结点并添加到hash链的头部，这时一定要修改modCount和count的值，同样修改count的值一定要放在最后一步。

ConcurrentHashMap的size

前面涉及到的操作都是在单个Segment中进行的，但是ConcurrentHashMap有一些操作是在多个Segment中进行，比如size操作，ConcurrentHashMap的size操作也采用了一种比较巧的方式，来尽量避免对所有的Segment都加锁。

public int size() {
  final Segment<K,V>[] segments = this.segments;
  long sum = 0;
  long check = 0;
  int[] mc = new int[segments.length];
  // Try a few times to get accurate count. On failure due to
  // continuous async changes in table, resort to locking.
  for (int k = 0; k < RETRIES_BEFORE_LOCK; ++k) {
    check = 0;
    sum = 0; 
    int mcsum = 0;
    for (int i = 0; i < segments.length; ++i) {
     sum += segments[i].count;
     mcsum += mc[i] = segments[i].modCount;
            }
  if (mcsum != 0) {
  for (int i = 0; i < segments.length; ++i) {
     check += segments[i].count; 
     if (mc[i] != segments[i].modCount) {
       check = -1; // force retry 
       break;
      }
   }
 }
 if (check == sum)
 break;
        }
 if (check != sum) { // Resort to locking all segments
 sum = 0;
 for (int i = 0; i < segments.length; ++i)
    segments[i].lock(); 
 for (int i = 0; i < segments.length; ++i)
    sum += segments[i].count;
 for (int i = 0; i < segments.length; ++i)
    segments[i].unlock();
  } 
 if (sum > Integer.MAX_VALUE)
   return Integer.MAX_VALUE;
 else
   return (int)sum;
    }

前面提到了一个Segment中的有一个modCount变量，代表的是对Segment中元素的数量造成影响的操作的次数，这个值只增不减，size操作就是遍历了两次Segment,每次记录Segment的modCount值，然后将两次的modCount进行比较，如果相同，则表示期间没有发生过写入操作，就将原先遍历的结果返回，如果不相同，则把这个过程再重复做一次，如果再不相同，则就需要将所有的Segment都锁住，然后一个一个遍历了。

ConcurrentHashMap的迭代

Map<String, String> map = new ConcurrentHashMap<String, String>();
map.put("a","a");
map.put("s", "s");

for(String s : map.keySet()){
if(s.equals("s")){
map.remove(s);
   }
}

我们在使用concurrentHashMap进行迭代遍历时，同时remove掉map中的数据，这种情况下为什么不会抛出ConcurrentModificationException？因为ConcurrentHashMap中的keySet(), values(), entrySet()方法，这些方法都返回相应的迭代器，查看Javadoc 后可知该视图的 iterator 是一个“弱一致”的迭代器。

abstract class HashIterator{  
    int nextSegmentIndex;  
    int nextTableIndex;  
    HashEntry<K,V>[] currentTable;  
    HashEntry<K, V> nextEntry;  
    HashEntry<K, V> lastReturned;  
}

nextSegmentIndex是段的索引，nextTableIndex是nextSegmentIndex对应段中中hash链的索引，currentTable是nextSegmentIndex对应段的table。
当我们调用entrySet返回值的iterator方法时，返回的是EntryIterator，在EntryIterator上调用next方法时，最终实际调用到了HashIterator.advance()方法。

// 先变量键值对的链表，再对table 数组的index 遍历，最后遍历分段数组的index。。这样就可以完整的变量完所有的entry了 
final void advance() {
  // 先变量键值对的链表 
  if (nextEntry != null && (nextEntry = nextEntry.next) != null)
  return;
  // 对table 数组的index 遍历 
  while (nextTableIndex >= 0) {
  if ( (nextEntry = currentTable[nextTableIndex--]) != null)
  return;
            } 
  // 遍历分段数组的index 
    while (nextSegmentIndex >= 0) {
         Segment<K,V> seg = segments[nextSegmentIndex--];
     if (seg.count != 0) {
          currentTable = seg.table;
     for (int j = currentTable.length - 1; j >= 0; --j) {
     if ( (nextEntry = currentTable[j]) != null) {
          nextTableIndex = j - 1; 
     return;
                        }
                    }
                }
            }
        }

这个方法在遍历底层数组。在遍历过程中，如果已经遍历的数组上的内容变化了，迭代器不会抛出ConcurrentModificationException异常。如果未遍历的数组上的内容发生了变化，则有可能反映到迭代过程中。这就是ConcurrentHashMap迭代器弱一致的表现。

马娅 /去哪儿网酒店事业部

文章转自公司wiki《酒店开发组的一千零一夜》，Qunar小伙伴可以登录wiki查看更多文章

点击“阅读原文”，查看Qunar第一届产品技术嘉年华精彩回顾O(∩_∩)O

以上是关于hashmap & concurrentHashmap的主要内容，如果未能解决你的问题，请参考以下文章