有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java My regex搜索只打印出最后一个匹配项

实际上,我编写了一个正则表达式来搜索文本中的web URL(下面是完整代码),但在运行代码时,console只打印文本中的最后一个URL。我不知道出了什么问题,实际上我使用了while循环。请参阅下面的代码,并帮助进行更正。谢谢

import java.util.*;
import java.util.regex.*;

public class Main
{
    static String query = "This is a URL http://facebook.com" 
    + " and this is another, http://twitter.com "
    + "this is the last URL http://instagram.com"
    + " all these URLs should be printed after the code execution";

    public static void main(String args[])
    {
        String pattern = "([\\w \\W]*)((http://)([\\w \\W]+)(.com))";
        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(query);

        while(m.find())
        {
             System.out.println(m.group(2));
        }
    }
}

在运行上述代码时,只有http://instagram.com被打印到控制台输出


共 (5) 个答案

  1. # 1 楼答案

    我找到了另一个正则表达式here

    https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)
    

    它查找https,但在您的情况下似乎有效

    我正在使用以下代码打印所有3个URL:

    public class Main {
    
    static String query = "This is a URL http://facebook.com"
            + " and this is another, http://twitter.com "
            + "this is the last URL http://instagram.com"
            + " all these URLs should be printed after the code execution";
    
    public static void main(String[] args) {
        String pattern = "https?:\\/\\/(www\\.)?[-a-zA-Z0-9@:%._\\+~#=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9@:%_\\+.~#?&//=]*)";
        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(query);
    
        while (m.find()) {
            System.out.println(m.group());
        }
      }
    }
    
  2. # 2 楼答案

    也许您正在寻找这个正则表达式:

    http://(\w+(?:\.\w+)+)
    

    例如,从该字符串:

    http://ww1.amazon.com and http://npr.org
    

    它提取

    "ww1.amazon.com"
    "npr.org"
    

    要详细分析其工作原理,请执行以下操作:

    http://      is literal
    ( ... )      is the main capture group
    \w+          find one or more alphanumeric characters
    (?: ... )    ...followed by a non-capturing group
    \.\w+        ...that contains a literal period followed by at least one alphanumeric
    +            repeated one or more times
    

    希望这有帮助

  3. # 3 楼答案

    我希望这将为您清除它,但是您匹配的字符太多,您的匹配应该尽可能限制性地,因为regex贪婪的,并且将尝试尽可能多地匹配

    以下是我对您的代码的看法:

    public class Main {
    
    
    static String query = "This is a URL http://facebook.com"
                    + " and this is another, http://twitter.com "
                    + "this is the last URL http://instagram.com"
                    + " all these URLs should be printed after the code execution";
    public static void main(String args[]) {
            String pattern = "(http:[/][/][Ww.]*[a-zA-Z]+.com)";
            Pattern p = Pattern.compile(pattern);
            Matcher m = p.matcher(query);
    
            while(m.find())
            {
                System.out.println(m.group(1));
            }
    }
    
    }
    

    如果您希望匹配更多您需要调整的内容以满足您的需要,则上述cote将仅匹配您的示例

    测试模式的一个好方法是http://www.regexpal.com/您可以在那里推送您的模式以完全匹配您想要的内容,只需记住在java中将\替换为转义字符的双\\

  4. # 4 楼答案

    我不确定这个模式有多可靠,但当我运行您的示例时,它会打印出所有的URL

    (http://[A-Za-z0-9]+\\.[a-zA-Z]{2,3})
    

    如果遇到如下url,则必须对其进行修改:

    http://www.instagram.com
    

    因为它只捕获没有“www”的URL

  5. # 5 楼答案

    您的问题是正则表达式量词(即*+字符)是贪婪的,这意味着它们尽可能匹配。您需要使用reluctant quantifiers。请参阅下面更正的代码模式-只需两个额外字符-在*+之间的?字符与尽可能少地匹配

    String pattern = "([\\w \\W]*?)((http://)([\\w \\W]+?)(.com))";