有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!


共 (1) 个答案

  1. # 1 楼答案

    一个代码点可以由多个仍然是only 16-bit unicodechar组成。在它的基础数组char[] value的索引中String中给方法的索引,而不是代码点的索引。Character的这些检查边界和换行方法:

    //Java 8 java.lang.String source code
    public int codePointAt(int index) {
        if ((index < 0) || (index >= value.length)) {
            throw new StringIndexOutOfBoundsException(index);
        }
        return Character.codePointAtImpl(value, index, value.length);
    }
    //...
    public int codePointBefore(int index) {
        int i = index - 1;
        if ((i < 0) || (i >= value.length)) {
            throw new StringIndexOutOfBoundsException(index);
        }
        return Character.codePointBeforeImpl(value, index, 0);
    }
    

    字符中的相应方法识别并组合属于单个代码点的多个char

    //Java 8 java.lang.Character source code
    static int codePointAtImpl(char[] a, int index, int limit) {
        char c1 = a[index];
        if (isHighSurrogate(c1) && ++index < limit) {
            char c2 = a[index];
            if (isLowSurrogate(c2)) {
                return toCodePoint(c1, c2);
            }
        }
        return c1;
    }
    //...
    static int codePointBeforeImpl(char[] a, int index, int start) {
        char c2 = a[ index];
        if (isLowSurrogate(c2) && index > start) {
            char c1 = a[ index];
            if (isHighSurrogate(c1)) {
                return toCodePoint(c1, c2);
            }
        }
        return c2;
    }
    

    这种差异很重要,因为index-1并不总是前一个代码点的开始;因此codePointBefore()需要从index-1开始并向后看,而codePointAt()需要从index开始并向前看