有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

Java正则表达式不区分大小写,带有德语Umlaut

我想在文本中查找/替换一个单词。例如“TÜTÜ”。以下是代码:

    final String regexX = "TÜTÜ";
    final String string = "dsad dasdasd dasd \n"
            + "dsds\n"
            + " dd \n"
            + "sadsd.sdasd. \n"
            + " universität \n"
            + " blö \n"
            + " Blö\n"
            + " ble\n"
            + "üeee \n"
            + " Wörterbuch \n"
            + "Das gute alte Tütü wird";
    final String subst = "";

    final Pattern pattern = Pattern.compile(regexX, Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);
    final Matcher matcherX = pattern.matcher(string);

    final String result = matcherX.replaceAll(subst);

    System.out.println("Substitution result: " + result);

结果是,没有任何东西可以被取代。这实际上是从regex101复制的代码。德国的TÜTÜ也不被认可。德国“Umlaute”对案件不敏感是真的吗?还是有办法让它起作用


共 (1) 个答案

  1. # 1 楼答案

    final Pattern pattern = Pattern.compile(regexX, Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);
    

    您需要添加Pattern.UNICODE_CASE,否则仅US-ASCII字符集用于不区分大小写的字符集:

    Enables Unicode-aware case folding. When this flag is specified then case-insensitive matching, when enabled by the CASE_INSENSITIVE flag, is done in a manner consistent with the Unicode Standard. By default, case-insensitive matching assumes that only characters in the US-ASCII charset are being matched.

    Unicode-aware case folding can also be enabled via the embedded flag expression (?u).

    Specifying this flag may impose a performance penalty.