Python中文网

一个关于 编程问题的解答网站.

有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

Java正则表达式从字符串中删除重复的子字符串

我试图构建一个正则表达式,以“减少”Java中字符串中重复的连续子字符串。例如,对于以下输入:

The big black dog big black dog is a friendly friendly dog who lives nearby nearby.

我希望得到以下输出:

The big black dog is a friendly dog who lives nearby.

这是我目前掌握的代码:

String input = "The big black dog big black dog is a friendly friendly dog who lives nearby nearby.";

Pattern dupPattern = Pattern.compile("((\\b\\w+\\b\\s)+)\\1+", Pattern.CASE_INSENSITIVE);
Matcher matcher = dupPattern.matcher(input);

while (matcher.find()) {
    input = input.replace(matcher.group(), matcher.group(1));
}

除句末外,所有重复的子串都可以计算为罚款:

The big black dog is a friendly dog who lives nearby nearby.

我知道我的正则表达式在子字符串中的每个单词后面都需要一个空格,这意味着它不能处理句点而不是空格的情况。我似乎找不到解决方法,我尝试过使用捕获组,并将正则表达式更改为查找空白或句点,而不仅仅是空白,但此解决方案只有在子字符串的每个重复部分(“nearest.nearest”)后都有句点时才有效

谁能给我指一下正确的方向吗?理想情况下,这种方法的输入将是简短的段落,而不仅仅是一行


共 (2) 个答案

  1. # 1 楼答案

    你可以用

    input.replaceAll("([ \\w]+)\\1", "$1");
    

    live demo:

    import java.io.*;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    class Ideone
    {
        public static void main (String[] args) throws java.lang.Exception
        {
            String input = "The big black dog big black dog is a friendly friendly dog who lives nearby nearby.";
    
            Pattern dupPattern = Pattern.compile("([ \\w]+)\\1", Pattern.CASE_INSENSITIVE);
            Matcher matcher = dupPattern.matcher(input);
    
            while (matcher.find()) {
                input = input.replaceAll("([ \\w]+)\\1", "$1");
            }
            System.out.println(input);
    
        }
    }
    
  2. # 2 楼答案

    结合@Thomas Ayoub的回答和@Matt的评论

    public class Test2 {
        public static void main(String args[]){
            String input = "The big big black dog big black dog is a friendly friendly dog who lives nearby nearby.";
            String result = input.replaceAll("\\b([ \\w]+)\\1", "$1");
            while(!input.equals(result)){
                input = result;
                result = input.replaceAll("\\b([ \\w]+)\\1", "$1");
            }
            System.out.println(result);
        }
    }