模式，Java中的matcher，正则表达式帮助

3 月，1 周 Questions & Answers 1185

我试图从文本文件中删除重复的连续单词，有人提到我可以这样做：

Pattern p = Pattern.compile("(\\w+) \\1");
StringBuilder sb = new StringBuilder(1000);
int i = 0;
for (String s : lineOfWords) { // line of words is a List<String> that has each line read in from txt file
Matcher m = p.matcher(s.toUpperCase());
// and then do something like
while (m.find()) {
  // do something here
}

我试着查看m.end，看看是否可以创建一个新字符串，或者删除匹配项所在的项，但在阅读文档后，我不确定它是如何工作的。例如，作为一个测试用例，看看它是如何工作的，我做了：

if (m.find()) {
System.out.println(s.substring(i, m.end()));
    }

指向具有以下内容的文本文件：This is an example example test test test.

为什么我的输出是This is

编辑：

如果我有一个AraryList行单词，它从一行单词中读出每一行。txt文件，然后我创建一个新的ArrayList来保存修改后的字符串。比如说

List<String> newString = new ArrayList<String>();
for (String s : lineOfWords { 
   s = s.replaceAll( code from Kobi here);
   newString.add(s);
}

但是它没有给我新的s，而是原始的s。是因为浅拷贝还是深拷贝

共 (2) 个答案

# 1 楼答案
尝试以下方法：
```
s = s.replaceAll("\\b(\\w+)\\b(\\s+\\1)+\\b", "$1");
```
这个正则表达式比你的强一点——它检查整个单词（没有部分匹配），并消除任何数量的连续重复
正则表达式捕获第一个单词：\b(\w+)\b，然后尝试匹配该单词的空格和重复：(\s+\1)+。最后的\b是避免\1的部分匹配，如"for formatting"
# 2 楼答案

第一个匹配是“Th是一个示例…”，所以m.end()指向第二个“is”的末尾。我不知道你为什么用i作为开始索引；试试m.start()

要改进正则表达式，请在单词前后使用\b，以指示应该有单词边界：(\\b\\w+\\b)。否则，正如您所看到的，您将在单词内部找到匹配项

Python中文网

有 Java 编程相关的问题?

模式，Java中的matcher，正则表达式帮助

共 (2) 个答案

# 1 楼答案

# 2 楼答案