Java中字符串类的成员使用了哪种utf8编码？

1 年，1 月 Questions & Answers 367

String类有一个构造函数：

 new String(byte[] bytes, Charset charset)

以及一种方法：

 byte[] getBytes(Charset charset)

鉴于我对我的charset的定义如下：

 Charset charset = Charset.forName("UTF-8");

我实际上会使用什么样的编码？更具体地说，它是标准UTF-8（如RFC 3629）、还是CESU-8、还是Modified UTF-8中所述？（另见相应的Wikipedia article）

如果不是标准的UTF-8，是否有允许utf8中的字符串操作的库

这些UTF-8派生编码的转换器非常受欢迎

Tags:

# 1 楼答案

The UTF-8 charset is specified by RFC 2279; the transformation format upon which it is based is specified in Amendment 2 of ISO 10646-1 and is also described in the Unicode Standard.

http://download-llnw.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html