java为什么组织。阿帕奇。薛西斯。解析器。SAXParser在utf8编码的xml中不跳过BOM？

1 年，8 月 Questions & Answers 6263

我有一个utf8编码的xml。这个文件包含了文件的开头。所以在解析过程中，我面对的是org。xml。萨克斯。SAXParseException:prolog中不允许包含内容。我无法从文件中删除这3个字节。我无法将文件加载到内存中并在此处删除它们（文件很大）。所以出于性能原因，我使用SAX解析器，只想跳过这3个字节，如果它们出现在“”标记之前。我应该为此继承InputStreamReader吗

我是java新手，请告诉我正确的方法

# 2 楼答案

private static char[] UTF32BE = { 0x0000, 0xFEFF };
private static char[] UTF32LE = { 0xFFFE, 0x0000 };
private static char[] UTF16BE = { 0xFEFF };
private static char[] UTF16LE = { 0xFFFE };
private static char[] UTF8 = { 0xEFBB, 0xBF };

private static boolean removeBOM(Reader reader, char[] bom) throws Exception {
    int bomLength = bom.length;
    reader.mark(bomLength);
    char[] possibleBOM = new char[bomLength];
    reader.read(possibleBOM);
    for (int x = 0; x < bomLength; x++) {
        if ((int) bom[x] != (int) possibleBOM[x]) {
            reader.reset();
            return false;
        }
    }
    return true;
}

private static void removeBOM(Reader reader) throws Exception {
    if (removeBOM(reader, UTF32BE)) {
        return;
    }
    if (removeBOM(reader, UTF32LE)) {
        return;
    }
    if (removeBOM(reader, UTF16BE)) {
        return;
    }
    if (removeBOM(reader, UTF16LE)) {
        return;
    }
    if (removeBOM(reader, UTF8)) {
        return;
    }
}

用法：

// xml can be read from a file, url or string through a stream
URL url = new URL("some xml url");
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(url.openStream()));
removeBOM(bufferedReader);

Python中文网

有 Java 编程相关的问题?

java为什么组织。阿帕奇。薛西斯。解析器。SAXParser在utf8编码的xml中不跳过BOM？

共 (2) 个答案

# 1 楼答案

# 2 楼答案