java如何处理字符串。函数的作用是

1 周，3 日 Questions & Answers 1303

假设我有这个例子

public static void main(String[] args) { System.out.println("This".codePointCount(0, 4)); }

输出为4 如果不是4，而是3，输出将是3 基本上产量是

0-3 or generally |firstIndex - secondIndex|

我不知道它是怎么工作的你能举一个输出不同的例子吗

|firstIndex - secondIndex|

谢谢

# 1 楼答案

从javadoc：

Returns the number of Unicode code points in the specified text range of this String. The text range begins at the specified beginIndex and extends to the char at index endIndex - 1. Thus the length (in chars) of the text range is endIndex-beginIndex. Unpaired surrogates within the text range count as one code point each.

Java使用Unicode表示文本（字符）。Unicode给每个字符一个称为“代码点”的数字。有不同的方式将这些数字写入字节，java使用“UTF-16”（每个字符2个字节）。不幸的是，2字节的字符太多。比65535多得多

为了解决这个问题，UTF-16使用4个字节（2对2）作为非常大的数字的代码点。这些被称为代理项对

令人恼火的是，java会让人感到困惑，因为它将4字节字符视为2个字符

示例（credits@Pshemo）："🍓🍑"此字符串有两个字符（一个草莓和一个桃子）。从技术上讲，它有两个代码点，一个用于草莓，一个用于桃子。但是如果你尝试一下，你会看到java说长度是4。因为每一个都是一个“代理对”

欲了解更多信息，请参阅： https://en.wikipedia.org/wiki/UTF-16 本文讨论了Javadoc中提到的代理项对

共 (2) 个答案

# 1 楼答案

从javadoc：

Returns the number of Unicode code points in the specified text range of this String. The text range begins at the specified beginIndex and extends to the char at index endIndex - 1. Thus the length (in chars) of the text range is endIndex-beginIndex. Unpaired surrogates within the text range count as one code point each.

Java使用Unicode表示文本（字符）。Unicode给每个字符一个称为“代码点”的数字。有不同的方式将这些数字写入字节，java使用“UTF-16”（每个字符2个字节）。不幸的是，2字节的字符太多。比65535多得多

为了解决这个问题，UTF-16使用4个字节（2对2）作为非常大的数字的代码点。这些被称为代理项对

令人恼火的是，java会让人感到困惑，因为它将4字节字符视为2个字符

示例（credits@Pshemo）："🍓🍑"此字符串有两个字符（一个草莓和一个桃子）。从技术上讲，它有两个代码点，一个用于草莓，一个用于桃子。但是如果你尝试一下，你会看到java说长度是4。因为每一个都是一个“代理对”

欲了解更多信息，请参阅： https://en.wikipedia.org/wiki/UTF-16 本文讨论了Javadoc中提到的代理项对
# 2 楼答案

Java使用UTF-16作为其内部字符和字符串表示。在UTF-16中，单个Unicode字符由一个或多个16位代码点表示

字符的数量并不总是与代码点的数量相同

见：Java notes on Unicode, for EG ^{}

编辑

相反，单个Unicode码点可以由多个16位字符组成

Python中文网

有 Java 编程相关的问题?

java如何处理字符串。函数的作用是

共 (2) 个答案

# 1 楼答案

# 2 楼答案

编辑