String拾遗

Posted 黑白灰

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了String拾遗相关的知识,希望对你有一定的参考价值。

简介:

String作为日常最常用的类,还是有必要对其中的细节做一些了解的,这篇就结合源码来看看这个常用的类。

一. 总述

类图如下:

从图中可以看到String是实现了 java.io.Serializable, Comparable<String>, CharSequence这三个接口的final类。
final的作用都应该是清楚的修饰在类上表示此类不能被继承,修饰在变量上表示变量被赋值后不允许被修改。

二.成员

/** The value is used for character storage. */
    private final char value[];

    /** Cache the hash code for the string */
    private int hash; // Default to 0
    
    /** use serialVersionUID from JDK 1.0.2 for interoperability */
    private static final long serialVersionUID = -6849794470754667710L;

    /**
     * Class String is special cased within the Serialization Stream Protocol.
     *
     * A String instance is written into an ObjectOutputStream according to
     * <a href="{@docRoot}/../platform/serialization/spec/output.html">
     * Object Serialization Specification, Section 6.2, "Stream Elements"</a>
     */
    private static final ObjectStreamField[] serialPersistentFields =
        new ObjectStreamField[0];

value:其注释表示的是vaule用来存储String的内容,也可以看出String其实就是个char数组。
hash:其注释表示的是缓存字符串的哈希值。
serialVersionUID:因为String实现了Serializable接口,所以支持序列化和反序列化支持。Java 的序列化机制是通过在运行时判断类的 serialVersionUID 来验证版本一致性的。在进行反序列化时,JVM 会把传来的字节流中的 serialVersionUID 与本地相应实体(类)的 serialVersionUID 进行比较,如果相同就认为是一致的,可以进行反序列化,否则就会出现序列化版本不一致的异常 (InvalidCastException)。
serialPersistentFields:用来提取序列化过程中某个对象内的字段【成员属性】元数据信息,包括字段的类型、类型代码、签名等。

三.构造方法

/**
     * Initializes a newly created {@code String} object so that it represents
     * an empty character sequence.  Note that use of this constructor is
     * unnecessary since Strings are immutable.
     */
    初始化创建一个新的String对象用来表示一个空字符串
    注意:完全没必要使用此构造器来创建一个String对象,因为String自身已经被设计为不可变
    public String() {
        this.value = new char[0];
    }
    
    /**
     * Initializes a newly created {@code String} object so that it represents
     * the same sequence of characters as the argument; in other words, the
     * newly created string is a copy of the argument string. Unless an
     * explicit copy of {@code original} is needed, use of this constructor is
     * unnecessary since Strings are immutable.
     *
     * @param  original
     *         A {@code String}
     */
     通过传入一个字符串参数来构建一个String对象,换句话说,新创建的字符串对象是 
     传入的字符串参数的一个副本。除非你确实需要显式的复制一个字符串对象,否则你完全 
     没必要使用此构造器来创建一个String对象,因为String自身已经被设计为不可变
    public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }
    
    
    /**
     * Allocates a new {@code String} so that it represents the sequence of
     * characters currently contained in the character array argument. The
     * contents of the character array are copied; subsequent modification of
     * the character array does not affect the newly created string.
     *
     * @param  value
     *         The initial value of the string
     */
     通过传入的一个字符数组来构建一个String对象,新创建的String对象是传入的字符数组的 
     一个副本,后续你对该字符数组对象的修改不会影响到当前新创建的String对象
    public String(char value[]) {
        this.value = Arrays.copyOf(value, value.length);
    }

    /**
     * Allocates a new {@code String} that contains characters from a subarray
     * of the character array argument. The {@code offset} argument is the
     * index of the first character of the subarray and the {@code count}
     * argument specifies the length of the subarray. The contents of the
     * subarray are copied; subsequent modification of the character array does
     * not affect the newly created string.
     *
     * @param  value
     *         Array that is the source of characters
     *
     * @param  offset
     *         The initial offset
     *
     * @param  count
     *         The length
     *
     * @throws  IndexOutOfBoundsException
     *          If the {@code offset} and {@code count} arguments index
     *          characters outside the bounds of the {@code value} array
     */
     通过传入的一个字符数组并根据指定的offset和count参数来截取得到一个子字符数组,
     新创建的字符串对象是子字符数组内容的一个副本,后续你对该子字符数组内容的修改不会影响到当前新创建的字符串对象。
     其中offset为startIndex,count为length。 
    public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count < 0) {
            throw new StringIndexOutOfBoundsException(count);
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }

    /**
     * Allocates a new {@code String} that contains characters from a subarray
     * of the <a href="Character.html#unicode">Unicode code point</a> array
     * argument.  The {@code offset} argument is the index of the first code
     * point of the subarray and the {@code count} argument specifies the
     * length of the subarray.  The contents of the subarray are converted to
     * {@code char}s; subsequent modification of the {@code int} array does not
     * affect the newly created string.
     *
     * @param  codePoints
     *         Array that is the source of Unicode code points
     *
     * @param  offset
     *         The initial offset
     *
     * @param  count
     *         The length
     *
     * @throws  IllegalArgumentException
     *          If any invalid Unicode code point is found in {@code
     *          codePoints}
     *
     * @throws  IndexOutOfBoundsException
     *          If the {@code offset} and {@code count} arguments index
     *          characters outside the bounds of the {@code codePoints} array
     *
     * @since  1.5
     */
     分配一个新的 String,它包含 Unicode 代码点数组参数一个子数组的字符。
     其中offset为startIndex,count为length,后续对 int 数组的修改不会影响新创建的字符串。
     codePoints代码点数组:Unicode是全世界统一的编码规则,但是这个只规定了各种字符的数字编码(相当于一种标准),
     具体实现的存储方式有utff-8,utf-16,utf-32等形式。这里放一个例子来更好的解释这个构造方法 
    public String(int[] codePoints, int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count < 0) {
            throw new StringIndexOutOfBoundsException(count);
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > codePoints.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }

        final int end = offset + count;

        // Pass 1: Compute precise size of char[]
        int n = count;
        for (int i = offset; i < end; i++) {
            int c = codePoints[i];
            if (Character.isBmpCodePoint(c))
                continue;
            else if (Character.isValidCodePoint(c))
                n++;
            else throw new IllegalArgumentException(Integer.toString(c));
        }

        // Pass 2: Allocate and fill in char[]
        final char[] v = new char[n];

        for (int i = offset, j = 0; i < end; i++, j++) {
            int c = codePoints[i];
            if (Character.isBmpCodePoint(c))
                v[j] = (char)c;
            else
                Character.toSurrogates(c, v, j++);
        }

        this.value = v;
    }

    eg:
    public static void main(String[] args) {
        // String hello Unicode转换后 \\u0068\\u0065\\u006c\\u006c\\u006f
        // 其中每一个转成 16进制转成10进制 就是 104,101,108,108,111
        int codeInt[] = {104,101,108,108,111};
        String xxx = new String(codeInt, 0, 5);
        System.out.println(xxx);
        // 输出结果为 hello
    }
    
    /**
     * Allocates a new {@code String} constructed from a subarray of an array
     * of 8-bit integer values.
     *
     * <p> The {@code offset} argument is the index of the first byte of the
     * subarray, and the {@code count} argument specifies the length of the
     * subarray.
     *
     * <p> Each {@code byte} in the subarray is converted to a {@code char} as
     * specified in the method above.
     *
     * @deprecated This method does not properly convert bytes into characters.
     * As of JDK&nbsp;1.1, the preferred way to do this is via the
     * {@code String} constructors that take a {@link
     * java.nio.charset.Charset}, charset name, or that use the platform\'s
     * default charset.
     *
     * @param  ascii
     *         The bytes to be converted to characters
     *
     * @param  hibyte
     *         The top 8 bits of each 16-bit Unicode code unit
     *
     * @param  offset
     *         The initial offset
     * @param  count
     *         The length
     *
     * @throws  IndexOutOfBoundsException
     *          If the {@code offset} or {@code count} argument is invalid
     *
     * @see  #String(byte[], int)
     * @see  #String(byte[], int, int, java.lang.String)
     * @see  #String(byte[], int, int, java.nio.charset.Charset)
     * @see  #String(byte[], int, int)
     * @see  #String(byte[], java.lang.String)
     * @see  #String(byte[], java.nio.charset.Charset)
     * @see  #String(byte[])
     */
     通过传入一个ASCII码的字节数组来构建一个新的字符串对象 
     注意:此方法不能正确的将字节数组转成字符,自JDK1.1版本起,实现此功能更佳的方式是 
     使用带charset(字符编码)参数的构造器来构建字符串对象,或者使用系统平台默认的字符集编码 
     (过时了,具体不研究啦...)
    @Deprecated
    public String(byte ascii[], int hibyte, int offset, int count) {
        checkBounds(ascii, offset, count);
        char value[] = new char[count];

        if (hibyte == 0) {
            for (int i = count; i-- > 0;) {
                value[i] = (char)(ascii[i + offset] & 0xff);
            }
        } else {
            hibyte <<= 8;
            for (int i = count; i-- > 0;) {
                value[i] = (char)(hibyte | (ascii[i + offset] & 0xff));
            }
        }
        this.value = value;
    }

    /**
     * Allocates a new {@code String} containing characters constructed from
     * an array of 8-bit integer values. Each character <i>c</i>in the
     * resulting string is constructed from the corresponding component
     * <i>b</i> in the byte array such that:
     *
     * <blockquote><pre>
     *     <b><i>c</i></b> == (char)(((hibyte &amp; 0xff) &lt;&lt; 8)
     *                         | (<b><i>b</i></b> &amp; 0xff))
     * </pre></blockquote>
     *
     * @deprecated  This method does not properly convert bytes into
     * characters.  As of JDK&nbsp;1.1, the preferred way to do this is via the
     * {@code String} constructors that take a {@link
     * java.nio.charset.Charset}, charset name, or that use the platform\'s
     * default charset.
     *
     * @param  ascii
     *         The bytes to be converted to characters
     *
     * @param  hibyte
     *         The top 8 bits of each 16-bit Unicode code unit
     *
     * @see  #String(byte[], int, int, java.lang.String)
     * @see  #String(byte[], int, int, java.nio.charset.Charset)
     * @see  #String(byte[], int, int)
     * @see  #String(byte[], java.lang.String)
     * @see  #String(byte[], java.nio.charset.Charset)
     * @see  #String(byte[])
     */
     通过传入一个ASCII码的字节数组来构建一个新的字符串对象, 
     直接从offset偏移量位置截取到字节数组的末尾 
      (过时了,具体不研究啦...)
    @Deprecated
    public String(byte ascii[], int hibyte) {
        this(ascii, hibyte, 0, ascii.length);
    }

    StringBuffer和StringBuider构造一个String
    这两个构造方法是很少用到的,因为当我们有了StringBuffer或者StringBuilfer对象之后可以直接使用他们的toString方法来得到String。
    使用StringBuilder的toString方法会更快一些,是因为StringBuffer的toString方法是synchronized的。
    /**
     * Allocates a new string that contains the sequence of characters
     * currently contained in the string buffer argument. The contents of the
     * string buffer are copied; subsequent modification of the string buffer
     * does not affect the newly created string.
     *
     * @param  buffer
     *         A {@code StringBuffer}
     */
     
    public String(StringBuffer buffer) {
        synchronized(buffer) {
            this.value = Arrays.copyOf(buffer.getValue(), buffer.length());
        }
    }

    /**
     * Allocates a new string that contains the sequence of characters
     * currently contained in the string builder argument. The contents of the
     * string builder are copied; subsequent modification of the string builder
     * does not affect the newly created string.
     *
     * <p> This constructor is provided to ease migration to {@code
     * StringBuilder}. Obtaining a string from a string builder via the {@code
     * toString} method is likely to run faster and is generally preferred.
     *
     * @param   builder
     *          A {@code StringBuilder}
     *
     * @since  1.5
     */
     
    public String(StringBuilder builder) {
        this.value = Arrays.copyOf(builder.getValue(), builder.length());
    }

三:String常用方法

  length() 返回当前字符串的长度
  isEmpty() 判断一个字符串是否为一个空字符串
  charAt(int index) 返回字符串中第(index+1)个字符
  equals(Object anObject)

/**
* Compares this string to the specified object. The result is {@code
* true} if and only if the argument is not {@code null} and is a {@code
* String} object that represents the same sequence of characters as this
* object.
*
* @param anObject
* The object to compare this {@code String} against
*
* @return {@code true} if the given object represents a {@code String}
* equivalent to this string, {@code false} otherwise
*
* @see #compareTo(String)
* @see #equalsIgnoreCase(String)
*/
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
if (anObject instanceof String) {
String anotherString = (String)anObject;
int n = value.length;
if (n == anotherString.value.length) {
char v1[] = value;
char v2[] = anotherString.value;
int i = 0;
while (n-- != 0) {
if (v1[i] != v2[i])
return false;
i++;
}
return true;
}
}
return false;
}

  内存地址相同,则为真。
  如果对象类型不是String类型,则为假。否则继续判断。
  如果对象长度不相等,则为假。否则继续判断。
  从后往前,判断String类中char数组value的单个字符是否相等,有不相等则为假。如果一直相等直到第一个数,则返回真。
  substring(): 从beginIndex参数表示的索引位置开始截取当前字符串对象
  hashCode(): 计算当前字符串的hashcode值,计算公式为: s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] 上面的n表示字符串的长度,^表示求幂运算 空字符串""的hashcode值为零
  hash值相等不一定字符串就一样,因为通过上述公式,不同的字符串可以算出相同的hash值
  valueOf() :将指定的obj对象转换成String对象 调用的是 Object.value().toString()
  compareTo() :返回两个字符串的比较结果,正数表示当前字符串较大,负数表示当前字符串较小,否则表达两者相等
  比较两个字符串的字母顺序
  这个比较操作是基于字符串对象的字符序列中包含的每个字符的Unicode值进行比较。
  如果当前字符串对象的字母顺序在字符串参数之前,那么返回结果为负数,
  如果当前字符串对象的字母顺序在字符串参数之后,那么返回结果为正数,
  如果当前字符串对象与字符串参数相等(即equals()返回true),那么返回结果为零

四:String里的享元模式

        String x = "A";
        String y = "A";
        String z = new String("A");
        String h = new String("A");
        System.out.println(x=="A");
        System.out.println(x==y);
        System.out.println(x==z);
        System.out.println(y==z);
        System.out.println(h==z);
        String ab = "AB";
        String aB = "A" + "B";
        System.out.println(ab==aB);
        true
        true
        false
        false
        false    
        true

      由此看出,直接用双引号赋值的String对象使用了享元模式,通过此方式创建的String对象,值相同时,共用同一个实例。
用new 方式创建的String对象,每次都开辟了新的内存空间。

五:Switch中的String

    我们知道Switch是在jdk 7支持String类型,其实现原理使用 String里的equals和hashCode来实现是否相等的比较的。

六:String StringBuilder  StringBuffer 

  用new创建对象的时候,会在堆中创建对象,而如果是直接用引号形式的话,会先看常量池是否有此字符串,有的话指向常量池的字符串
  StringBuilder是非线程安全的,StringBuffer是线程安全的
  字符串相加的时候,直接相加的时候,编译器会进行优化,而如果是间接相加的时候,实际上会创建一个StringBuilder来进行append

小结:

  这是对String源码的粗略分析和参考一些博客的成果,用于自己备忘,希望对需要的人有帮助。

以上是关于String拾遗的主要内容,如果未能解决你的问题,请参考以下文章

.NET基础拾遗字符串集合和流1

Final Cut Pro X 拾遗

2016/3/31 拾遗 php字符串中 转义字符 “ ’‘ ” ’ “” ‘ " ’ ' ' ‘ " " &

拾遗:令人迷惑的写法

.NET基础拾遗多线程开发基础2

Java基础拾遗——数组与对象的拷贝