一场setTag引发的血案与思考

Posted 2023-02-16 林克在思考

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了一场setTag引发的血案与思考相关的知识，希望对你有一定的参考价值。

今天讲一个android中由于setTag使用不慎引发的血案以及一些思考。

故事背景

如上图，app这个Module通过远程依赖aar的方式依赖了IM这个Module。在app这个Module中有如下的代码：

View view = findViewById(R.id.view);
view.setTag(R.id.root_position);

这个 root_position 是声明在 IM 这个Module中的。声明如下：

<item type="id" name="root_position"/>

有一天，IM这个Module删除了这个id的声明，我们在升级了IM这个Module的版本以后，发现编译失败了。
怎么办呢？我们都知道这个 setTag 的第一个参数是个int值，又偷懒不想在app这个Module中声明一个id值，想着一个对象的hashCode也是个int值，所以写了下面的代码：

private int mTagId = MainActivity.this.hashCode();
...
View view = findViewById(R.id.view);
view.setTag(mTagId, position);

然后编译运行，完美，然后就上线了，然后，就收到了源源不断的Crash信息，但也不是所有用户都崩溃。
主要Crash信息如下：

java.lang.IllegalArgumentException: The key must be an application-specific resource id.

原因分析

报错信息很明显，说 setTag 的key必须是一个应用指定的资源id。我们查看setTag的源码注释如下：

Sets a tag associated with this view and a key. A tag can be used
to mark a view in its hierarchy and does not have to be unique within
the hierarchy. Tags can also be used to store data within a view
without resorting to another data structure.

The specified key should be an id declared in the resources of the
application to ensure it is unique (see the <a
href="@docRootguide/topics/resources/more-resources.html#Id">ID resource type</a>).
Keys identified as belonging to
the Android framework or not associated with any package will cause
an @link IllegalArgumentException to be thrown.

注释最后一段简单翻译一下就是：
如果这个key是属于Android框架或者没有跟任何package关联，将会抛出IllegalArgumentException异常
看注释还是不太明白，我们看代码会在什么时候抛出IllegalArgumentException：

if ((key >>> 24) < 2) 
  throw new IllegalArgumentException("The key must be an application-specific " + "resource id.");

看代码，至少我们明白了一点，那就是如果我们在前面代码里面生成的hashCode在位移运算以后小于2就会抛出这个运行时异常，导致应用Crash。

继续深入

虽然我们找到了应用Crash的原因，但是也引出了两个疑问：

为什么是(key >>> 24) < 2
hashCode的生成规则是什么，为什么我们本地修改完运行的时候却没有崩溃，而到用户手机上就崩溃了呢？
下面我们就这两个问题深入研究一下：

1.为什么是(key >>> 24) < 2

其实看报错和注释，我们已经有了一点信息，这个判断条件能够判断我们传入的key是不是一个application-specific resource id。那我们就需要研究一下这个应用资源id是如何生成的。
我们先在app这个Module的res/value文件夹下添加一个ids.xml文件，在其中写入如下代码声明一个id：

<?xml version="1.0" encoding="utf-8" ?>
<resources>
    <item name="root_position" type="id"/>
</resources>

然后编译app这个Module。
接着我们就可以在如下路径中找到一个R.java文件：

Sample/app/build/generated/not_namespaced_r_class_sources/debug/processDebugResources/r/cn/codekong/sample/R.java

其中cn/codekong/sample是我这个Sample项目的包路径，实际测试时需要根据实际情况有所变化。
然后我们在这个文件中搜索，我们发现了这样的代码：

package cn.codekong.sample;

public final class R 
     public static final class id 
          public static final int root_position=0x7f070061;

此处我省略了无关代码，只留下我们关心的信息。
这个R.java文件其实是AndroidStudio使用aapt工具帮我们生成了一个R.java的类，其中有一个静态内部类id，里面声明了一个int常量，正是我们在ids.xml中声明的那个id，这也就顺理成章地解释了我们为什么在xml中声明的字段，可以在Java代码中被引用到。

接下来我们先验证一下，系统帮我们生成的这个id，会不会造成上面的Crash，使用下面Java代码验证：

public class Sample 
    public static final int root_position = 0x7f070061;

    public static void main(String[] args) 
        System.out.println((root_position >>> 24) < 2);

结果返回false，说明是合法的id。
接着我们深入说一下这个id的int值是如何组成的，为什么通过上面的判断条件就能判断出合不合法。

资源id的组成

资源id是一个4字节的无符号整数，其中，最高字节表示Package ID，次高字节表示Type ID，最低两位字节表示Entry ID，如下图：

Package ID相当于一个命名空间，限定资源的来源。Android系统中定义了两个命名空间，一个是系统的资源命名空间，它的Package ID值是0x01，比如你可以打印出下面这个系统资源id的值：

android.R.id.primary

然后将其无符号右移24位，就会得到Package ID值为1。
另外一个是应用程序的资源命名空间，它的Package ID是0x7f，就如我们上面自定义的那个id值，它的Package ID就是0x7F。
Android系统规定所有在[0x01, 0x7f]之间的Package ID值都是合法的，在这个范围之外的ID都是不合法的。

这时候就解开了我们第一个疑问：为什么(key >>> 24) < 2 会抛出异常。这个范围正好限定了Package ID的范围。< 2是为了将系统资源Id排除在外，这个在setTag函数的注释里也有说明，这个前面已经提到过。那为什么要把系统资源Id排除呢？这个问题，我并没有找到官方的解释，个人的看法是：因为系统资源Id一般是定义用来内部使用，系统不希望我们的代码跟系统资源Id产生依赖，以防止某天移除了某个系统资源Id而影响到我们的应用程序。

Type ID是指资源的类型ID，我们常见的资源类型ID有anim、color、drawable、layout、menu、raw、string等等。
Entry ID是指每个资源在所属的资源类型中出现的次序。

2.hashCode的生成规则是什么

我们前面提到，我们将hashCode设置给setTag以后，在本机运行是正常的，然后才上线，但是上线以后出现了Crash。这说明，在我们本地运行的时候生成的hashCode是符合上面的规则的。当运行在部分用户手机上时，由于生产的hashCode不符合规范，从而造成崩溃。

我们在本地先做一个实验，看看这个hashCode究竟是怎么变化的。我们使用下面最简单的代码做实验：

package cn.codekong.sample;

import android.os.Bundle;
import android.support.v7.app.AppCompatActivity;
import android.util.Log;

public class MainActivity extends AppCompatActivity 

    private static final String TAG = "MainActivity";
    private int mTagId = MainActivity.this.hashCode();

    @Override
    protected void onCreate(Bundle savedInstanceState) 
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);
        Log.e(TAG, "mTagId = " + mTagId);

然后你会发现，在同一台手机上，无论你运行多少次，都会返回同一个结果。
然后你换一台手机运行，发现又是另一个结果。

在我们常规的意识里面，我们认为hashCode()方法返回的结果就是对象的内存地址，但是如果是内存地址，那应该在同一台设备上每次运行结果应该是不同的，现在看来，却没那么简单。

我们知道，在Java中，对于统一个对象，其hashCode值是不变的，hashCode标识了一个对象，因为上面例子中MainActivity.this这个对象包含的属性和方法没变，所以每次返回hashCode()在同一台设备上就不会变，这也是符合虚拟机规范的，同时根据简单的验证来看这个值不是根据内存地址来生成的。

我们都知道，在Java里面，如果我们没有重写一个类的hashCode()方法，其默认调用的是Object类的hashCode()方法，下面是Java的Object类的hashCode()方法代码：

public native int hashCode();

可以看到它是一个native方法，由于我们无法拿到oracle JDK的源码，所以无法查看其C++代码实现，但是我们可以在方法注释中看到一些相关信息：

As much as is reasonably practical, the hashCode method defined by
class @code Object does return distinct integers for distinct
objects. (This is typically implemented by converting the internal
address of the object into an integer, but this implementation
technique is not required by the
Java&trade; programming language.)

我只截取了其中最重要的一段，其主要的核心思想是：对于不同的对象，返回不同的int值，一般的经典实现是将对象的内存地址转换为int值进行返回，但这不是Java语言强制要求的实现方式。
所以在Java中，我们一般意义上认为hashCode()函数返回的值即为当前对象的内存地址，其实这也依赖于JVM实现者的具体实现。

我们可以看看openJDK8u60版本中对于hashCode()方法的实现
在线代码地址：http://hg.openjdk.java.net/jdk8u/jdk8u60/jdk/file/935758609767/src/share/native/java/lang/Object.c
代码如下：

static JNINativeMethod methods[] = 
    "hashCode",    "()I",                    (void *)&JVM_IHashCode,
    "wait",        "(J)V",                   (void *)&JVM_MonitorWait,
    "notify",      "()V",                    (void *)&JVM_MonitorNotify,
    "notifyAll",   "()V",                    (void *)&JVM_MonitorNotifyAll,
    "clone",       "()Ljava/lang/Object;",   (void *)&JVM_Clone,
;

上面代码指定了hashCode()方法的具体实现类是JVM_IHashCode方法，我们进一步查看，
在线代码地址：http://hg.openjdk.java.net/jdk8u/jdk8u60/hotspot/file/37240c1019fd/src/share/vm/prims/jvm.cpp
代码如下：

JVM_ENTRY(jint, JVM_IHashCode(JNIEnv* env, jobject handle))
  JVMWrapper("JVM_IHashCode");
  // as implemented in the classic virtual machine; return 0 if object is NULL
  return handle == NULL ? 0 : ObjectSynchronizer::FastHashCode (THREAD, JNIHandles::resolve_non_null(handle)) ;
JVM_END

进一步调用了FastHashCode()方法，我们继续深入：
在线代码地址：http://hg.openjdk.java.net/jdk8u/jdk8u60/hotspot/file/37240c1019fd/src/share/vm/runtime/synchronizer.cpp
代码如下：

intptr_t ObjectSynchronizer::FastHashCode (Thread * Self, oop obj) 
  if (UseBiasedLocking) 
    // NOTE: many places throughout the JVM do not expect a safepoint
    // to be taken here, in particular most operations on perm gen
    // objects. However, we only ever bias Java instances and all of
    // the call sites of identity_hash that might revoke biases have
    // been checked to make sure they can handle a safepoint. The
    // added check of the bias pattern is to avoid useless calls to
    // thread-local storage.
    if (obj->mark()->has_bias_pattern()) 
      // Box and unbox the raw reference just in case we cause a STW safepoint.
      Handle hobj (Self, obj) ;
      // Relaxing assertion for bug 6320749.
      assert (Universe::verify_in_progress() ||
              !SafepointSynchronize::is_at_safepoint(),
             "biases should not be seen by VM thread here");
      BiasedLocking::revoke_and_rebias(hobj, false, JavaThread::current());
      obj = hobj() ;
      assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now");
    
  

  // hashCode() is a heap mutator ...
  // Relaxing assertion for bug 6320749.
  assert (Universe::verify_in_progress() ||
          !SafepointSynchronize::is_at_safepoint(), "invariant") ;
  assert (Universe::verify_in_progress() ||
          Self->is_Java_thread() , "invariant") ;
  assert (Universe::verify_in_progress() ||
         ((JavaThread *)Self)->thread_state() != _thread_blocked, "invariant") ;

  ObjectMonitor* monitor = NULL;
  markOop temp, test;
  intptr_t hash;
  markOop mark = ReadStableMark (obj);

  // object should remain ineligible for biased locking
  assert (!mark->has_bias_pattern(), "invariant") ;

  if (mark->is_neutral()) 
    hash = mark->hash();              // this is a normal header
    if (hash)                        // if it has hash, just return it
      return hash;
    
    hash = get_next_hash(Self, obj);  // allocate a new hash code
    temp = mark->copy_set_hash(hash); // merge the hash code into header
    // use (machine word version) atomic operation to install the hash
    test = (markOop) Atomic::cmpxchg_ptr(temp, obj->mark_addr(), mark);
    if (test == mark) 
      return hash;
    
    // If atomic operation failed, we must inflate the header
    // into heavy weight monitor. We could add more code here
    // for fast path, but it does not worth the complexity.
   else if (mark->has_monitor()) 
    monitor = mark->monitor();
    temp = monitor->header();
    assert (temp->is_neutral(), "invariant") ;
    hash = temp->hash();
    if (hash) 
      return hash;
    
    // Skip to the following code to reduce code size
   else if (Self->is_lock_owned((address)mark->locker())) 
    temp = mark->displaced_mark_helper(); // this is a lightweight monitor owned
    assert (temp->is_neutral(), "invariant") ;
    hash = temp->hash();              // by current thread, check if the displaced
    if (hash)                        // header contains hash code
      return hash;
    
    // WARNING:
    //   The displaced header is strictly immutable.
    // It can NOT be changed in ANY cases. So we have
    // to inflate the header into heavyweight monitor
    // even the current thread owns the lock. The reason
    // is the BasicLock (stack slot) will be asynchronously
    // read by other threads during the inflate() function.
    // Any change to stack may not propagate to other threads
    // correctly.
  

  // Inflate the monitor to set hash code
  monitor = ObjectSynchronizer::inflate(Self, obj);
  // Load displaced header and check it has hash code
  mark = monitor->header();
  assert (mark->is_neutral(), "invariant") ;
  hash = mark->hash();
  if (hash == 0) 
    hash = get_next_hash(Self, obj);
    temp = mark->copy_set_hash(hash); // merge hash code into header
    assert (temp->is_neutral(), "invariant") ;
    test = (markOop) Atomic::cmpxchg_ptr(temp, monitor, mark);
    if (test != mark) 
      // The only update to the header in the monitor (outside GC)
      // is install the hash code. If someone add new usage of
      // displaced header, please update this code
      hash = test->hash();
      assert (test->is_neutral(), "invariant") ;
      assert (hash != 0, "Trivial unexpected object/monitor header usage.");
    
  
  // We finally get the hash
  return hash;

借助于文中的注释，我们了解到，hashCode的生成是根据不同的状态，有不同的hashCode生成策略，不是简单的返回一个变量的内存地址那么简单。对于其细节，我们这里不做研究。
看完了Java的实现，我们来看看Android源代码中，hashCode()方法的实现代码：

public int hashCode() 
    return identityHashCode(this);


static int identityHashCode(Object obj) 
    int lockWord = obj.shadow$_monitor_;
    final int lockWordStateMask = 0xC0000000;  // Top 2 bits.
    final int lockWordStateHash = 0x80000000;  // Top 2 bits are value 2 (kStateHash).
    final int lockWordHashMask = 0x0FFFFFFF;  // Low 28 bits.
    if ((lockWord & lockWordStateMask) == lockWordStateHash) 
        return lockWord & lockWordHashMask;
    
    return identityHashCodeNative(obj);


private static native int identityHashCodeNative(Object obj);

你会发现在Android中，Object类的hashCode()方法不再是native方法，而是调用了一个identityHashCode()方法，这个方法内部的位运算逻辑我们可以暂时不用关心，只关注其最后调用的identityHashCodeNative()方法，这又是一个Native方法。

我们选择Amdroid 9.0的系统源码，在art虚拟机源码内找到如下代码：
代码在线地址：http://androidxref.com/9.0.0_r3/xref/art/runtime/native/java_lang_Object.cc#54

static jint Object_identityHashCodeNative(JNIEnv* env, jclass, jobject javaObject) 
  ScopedFastNativeObjectAccess soa(env);
  ObjPtr<mirror::Object> o = soa.Decode<mirror::Object>(javaObject);
  return static_cast<jint>(o->IdentityHashCode());

上面代码进一步调用到IdentityHashCode()，这是真正的hashCode生成位置。
代码在线地址：http://androidxref.com/9.0.0_r3/xref/art/runtime/mirror/object.cc#187
代码如下：

int32_t Object::IdentityHashCode() 
  ObjPtr<Object> current_this = this;  // The this pointer may get invalidated by thread suspension.
  while (true) 
    LockWord lw = current_this->GetLockWord(false);
    switch (lw.GetState()) 
      case LockWord::kUnlocked: 
        // Try to compare and swap in a new hash, if we succeed we will return the hash on the next
        // loop iteration.
        LockWord hash_word = LockWord::FromHashCode(GenerateIdentityHashCode(), lw.GCState());
        DCHECK_EQ(hash_word.GetState(), LockWord::kHashCode);
        if (current_this->CasLockWordWeakRelaxed(lw, hash_word)) 
          return hash_word.GetHashCode();
        
        break;
      
      case LockWord::kThinLocked: 
        // Inflate the thin lock to a monitor and stick the hash code inside of the monitor. May
        // fail spuriously.
        Thread* self = Thread::Current();
        StackHandleScope<1> hs(self);
        Handle<mirror::Object> h_this(hs.NewHandle(current_this));
        Monitor::InflateThinLocked(self, h_this, lw, GenerateIdentityHashCode());
        // A GC may have occurred when we switched to kBlocked.
        current_this = h_this.Get();
        break;
      
      case LockWord::kFatLocked: 
        // Already inflated, return the hash stored in the monitor.
        Monitor* monitor = lw.FatLockMonitor();
        DCHECK(monitor != nullptr);
        return monitor->GetHashCode();
      
      case LockWord::kHashCode: 
        return lw.GetHashCode();
      
      default: 
        LOG(FATAL) << "Invalid state during hashcode " << lw.GetState();
        break;
      
    
  
  UNREACHABLE();

从上面内容，我们发现，Android中hashCode的生成不是只有一种方法，而是根据不同的状态使用不同的生成策略。这也印证了我们之前代码测试的结果，Android中的hashCode()函数也不是简单的返回对象的内存地址。
由于对C++本身不够了解，此处也无法断点运行，代码分析只能暂时止步于此，但是也基本找到了我们想要的答案。

写在最后

一次setTag引发的血案，引出了一系列疑问，进而去研究，也算是有一点收获。

以上是关于一场setTag引发的血案与思考的主要内容，如果未能解决你的问题，请参考以下文章

继承的爱恨情仇——一场钻石引发的血案

Java HashMap详解：一场由于不懂Map集合而引发的“血案”

csdn排名出了Bug了？一场因排名引发的血案！

由上一个血案引发的关于property和attribute关系的思考

小记：Windows redis引发的一场血案（Cannot get Jedis connection&&java.util.NoSuchElementException Unable

第一篇：白话tornado源码之一个脚本引发的血案