使用Java将ASCII转换为UTF16的unicode
我能够使用以下代码找出如何将Unicode字符串转换为ASCII字符串。(学分在代码中)
//create a string using unicode that says "hello" when printed to console
String unicode = "\u0068" + "\u0065" + "\u006c" + "\u006c" + "\u006f";
System.out.println(unicode);
System.out.println("");
/* Test code for converting unicode to ASCII
* Taken from http://stackoverflow.com/questions/15356716/how-can-i-convert-unicode-string-to-ascii-in-java
* Will be commented out later after tested and implemented.
*/
//String s = "口水雞 hello Ä";
//replace String s with String unicode for conversion
String s1 = Normalizer.normalize(unicode, Normalizer.Form.NFKD);
String regex = Pattern.quote("[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+");
String s2 = new String(s1.replaceAll(regex, "").getBytes("ascii"), "ascii");
System.out.println(s2);
System.out.println(unicode.length() == s2.length());
//End of Test code that was implemented
现在,我的问题和好奇心战胜了我。我尝试过谷歌搜索,因为我对Java没有最好的了解
我的问题是,是否可以将ASCII字符串转换为UTF格式?尤其是UTF-16。(我之所以说UTF-16,是因为我知道UTF-8与ASCII有多相似,因此不必从ASCII转换为UTF-8)
提前谢谢
# 1 楼答案
Java字符串使用UTF-16作为内部格式,它与
String
类无关。您将仅在两种情况下看到差异:String
作为字节数组检查时(请参见下文)。这在C语言中一直都会发生,但在更现代的语言中,字符串和字节数组(例如Java或Python3.x)之间的区别是不正确的李>如果您想在写入文件(或等效文件)之前将内容编码为UTF-16,可以使用:
生成的文件将包含:
这是UTF-16,开头有BOM字节