有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

你能解释一下这种Java正则表达式的奇怪行为吗?

我一直在使用下面的代码尝试从我提供的文本中提取不同的部分

它应该挑出数字,然后将[大括号或"引号中的任何部分放入组中。这是代码

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Launcher2 {

    /**
     * @param args
     */
    public static void main(String[] args) {
        PrintRegexes("100.000[$₮-45]");
    }
    public static void PrintRegexes(String textToMatch){
        Pattern p = Pattern.compile("(\\[.*?\\]|\".*?\")?.*?(\\d{1,3}(?:,\\d{3})*?(?:\\.\\d+)?).*?(\\[.*?\\]|\".*?\")",Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
        Matcher m = p.matcher(textToMatch);
        if (m.find())
        {
            for(int groups =0;groups<m.groupCount();groups++){
                System.out.println("Group "+groups+" contains "+m.group(groups));
            }
            for(int groups =0;m.find(groups);groups++){ //this will error, but right now, it's the least of my concerns
                System.out.println("Group "+groups+" contains "+m.group(groups));
            }
        }

    }
}

Group 0 contains 100.000[$₮-45]
Group 1 contains null
Group 2 contains 100.000
Group 3 contains [$₮-45]
Group 0 contains 100.000[$₮-45]
Group 1 contains null
Group 2 contains 0.000
Group 3 contains [$₮-45]
Exception in thread "main" java.lang.IndexOutOfBoundsException: No group 4 //don't care about this, I've got bigger strings(fish) to regex(fry) at the moment!
    at java.util.regex.Matcher.group(Unknown Source)
    at Launcher2.PrintRegexes(Launcher2.java:21)
    at Launcher2.main(Launcher2.java:10)

除了group 2之外,所有组都是相同的,一组打印为0.000,一组打印为100.000

这是为什么

如果我只是在数字的前后加上一些东西,这种行为就会消失

如果我只是在前面放一些东西,我会得到这个输出:

Group 0 contains [$₮-45]100.000
Group 1 contains [$₮-45]
Group 2 contains 100.000
Group 3 contains null
Group 0 contains [$₮-45]100.000
Group 1 contains null
Group 2 contains 45
Group 3 contains null

更奇怪!(对我来说)最奇怪的是,它在www.debuggex上运行。com

我写错模式了吗?或者是matcher在这个方法Matcher m = p.matcher(textToMatch);构造它时没有计算出组,这会影响它的行为吗


共 (2) 个答案

  1. # 1 楼答案

    我可以看到两个问题

    首先,以组为参数多次调用m.find(),这与您认为的方式不同
    如果查看JavaDoc for find(int start),您会看到它重置匹配器,然后从输入的指定字符开始重新启动搜索。这解释了在以后的迭代中匹配的序列数更短的原因

    其次,需要循环到groups <= m.groupCount()才能获得所有组:

        Pattern p =
                Pattern.compile("(\\[.*?\\]|\".*?\")?.*?(\\d{1,3}(?:,\\d{3})*?(?:\\.\\d+)?).*?(\\[.*?\\]|\".*?\")",
                    Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
        Matcher m = p.matcher(textToMatch);
        if (m.find()) {
            for (int groups = 0; groups <= m.groupCount(); groups++) {
                System.out.println("Group " + groups + " contains " + m.group(groups));
            }
        }
    

    印刷品

    Group 0 contains 100.000[$₮-45]
    Group 1 contains null
    Group 2 contains 100.000
    Group 3 contains [$₮-45]

  2. # 2 楼答案

    看起来问题出在这部分:(?:,\\d{3})*?

    我想你需要((?:,\\d{3})*)?