java My regex搜索只打印出最后一个匹配项

1 年 Questions & Answers 59

实际上，我编写了一个正则表达式来搜索文本中的web URL（下面是完整代码），但在运行代码时，console只打印文本中的最后一个URL。我不知道出了什么问题，实际上我使用了while循环。请参阅下面的代码，并帮助进行更正。谢谢

import java.util.*;
import java.util.regex.*;

public class Main
{
    static String query = "This is a URL http://facebook.com" 
    + " and this is another, http://twitter.com "
    + "this is the last URL http://instagram.com"
    + " all these URLs should be printed after the code execution";

    public static void main(String args[])
    {
        String pattern = "([\\w \\W]*)((http://)([\\w \\W]+)(.com))";
        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(query);

        while(m.find())
        {
             System.out.println(m.group(2));
        }
    }
}

在运行上述代码时，只有http://instagram.com被打印到控制台输出

Tags:

共 (5) 个答案

# 1 楼答案

我找到了另一个正则表达式here

https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)

它查找https，但在您的情况下似乎有效

我正在使用以下代码打印所有3个URL：

public class Main {

static String query = "This is a URL http://facebook.com"
        + " and this is another, http://twitter.com "
        + "this is the last URL http://instagram.com"
        + " all these URLs should be printed after the code execution";

public static void main(String[] args) {
    String pattern = "https?:\\/\\/(www\\.)?[-a-zA-Z0-9@:%._\\+~#=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9@:%_\\+.~#?&//=]*)";
    Pattern p = Pattern.compile(pattern);
    Matcher m = p.matcher(query);

    while (m.find()) {
        System.out.println(m.group());
    }
  }
}

# 2 楼答案

也许您正在寻找这个正则表达式：

http://(\w+(?:\.\w+)+)

例如，从该字符串：

http://ww1.amazon.com and http://npr.org

它提取

"ww1.amazon.com"
"npr.org"

要详细分析其工作原理，请执行以下操作：

http://      is literal
( ... )      is the main capture group
\w+          find one or more alphanumeric characters
(?: ... )    ...followed by a non-capturing group
\.\w+        ...that contains a literal period followed by at least one alphanumeric
+            repeated one or more times

希望这有帮助

# 3 楼答案

我希望这将为您清除它，但是您匹配的字符太多，您的匹配应该尽可能限制性地，因为regex是贪婪的，并且将尝试尽可能多地匹配

以下是我对您的代码的看法：

public class Main {


static String query = "This is a URL http://facebook.com"
                + " and this is another, http://twitter.com "
                + "this is the last URL http://instagram.com"
                + " all these URLs should be printed after the code execution";
public static void main(String args[]) {
        String pattern = "(http:[/][/][Ww.]*[a-zA-Z]+.com)";
        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(query);

        while(m.find())
        {
            System.out.println(m.group(1));
        }
}

}

如果您希望匹配更多您需要调整的内容以满足您的需要，则上述cote将仅匹配您的示例

测试模式的一个好方法是http://www.regexpal.com/您可以在那里推送您的模式以完全匹配您想要的内容，只需记住在java中将\替换为转义字符的双\\

# 4 楼答案
我不确定这个模式有多可靠，但当我运行您的示例时，它会打印出所有的URL
```
(http://[A-Za-z0-9]+\\.[a-zA-Z]{2,3})
```
如果遇到如下url，则必须对其进行修改：
```
http://www.instagram.com
```
因为它只捕获没有“www”的URL
# 5 楼答案
您的问题是正则表达式量词（即*和+字符）是贪婪的，这意味着它们尽可能匹配。您需要使用reluctant quantifiers。请参阅下面更正的代码模式-只需两个额外字符-在*和+之间的?字符与尽可能少地匹配
```
String pattern = "([\\w \\W]*?)((http://)([\\w \\W]+?)(.com))";
```

Python中文网

有 Java 编程相关的问题?

java My regex搜索只打印出最后一个匹配项

共 (5) 个答案

# 1 楼答案

# 2 楼答案

# 3 楼答案

# 4 楼答案

# 5 楼答案