Python re.findall发现了奇怪的错误模式

import re singles = r'[()\.\/$%=0-9,?!=; \t\n\r\f\v\":\[\]><]' digits_str = singles + r'[()\-\.\/$%=0-9 \t\n\r\f\v\'\":\[\]]*' #small_word = '[a-zA-Z0-9]{1,3}' #junk_then_small_word = singles + small_word + '(' + singles + small_word + ')*' email = singles + '\S+@\S*' http_str = r'[^\.]+\.+[^\.]+\.+([^\.]+\.+)+?' http = '(http|https|www)' + http_str web_address = '([a-zA-Z0-9]+\.+)+[a-zA-Z0-9]{1,3}' pat = email + '|' + digits_str d_pat = re.compile(web_address) text = '''"Lucy Gonzalez" test-defis-wtf <stagecoachmama@hotmail.com> on 11/28/2000 01:02:22 PM http://www.living.com/shopping/item/item.jhtml?.productId=LC-JJHY-2.00-10.4S.I will send checks directly to the vendor for any bills pre 4/20. I will fax you copies. I will also try and get the payphone transferred. www.capitolconnection.org <http://www.capitolconnection.org>. and/or =3D=3D=3D=3D=3D=3D=3D= O\'rourke''' print('findall:') for x in re.findall(d_pat,text): print(x) print('split:') for x in re.split(d_pat,text): print(x)

2条回答

网友

1楼 · 编辑于 2024-09-30 14:31:01

来自^{}的文档：

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

正则表达式有组，即括号中的部分。如果您想显示整个匹配，请将regex放在一个大组中（在整个对象周围放上括号），然后执行print(x[0])而不是print(x)

网友

2楼 · 编辑于 2024-09-30 14:31:01

我猜我们的表达式必须在这里修改，这可能是问题所在，例如，如果我们希望匹配所需的模式，我们将从一个类似以下的表达式开始：

([a-zA-Z0-9]+)\.

如果我们希望在.之后有1到3个字符，我们可以将其扩展为：

([a-zA-Z0-9]+)\.([a-zA-Z0-9]{1,3})?

Demo 1

Demo 2

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"([a-zA-Z0-9]+)\.([a-zA-Z0-9]{1,3})?"

test_str = ("hotmail.\n"
    "living.\n"
    "item.\n"
    "2.\n"
    "4S.\n"
    "hotmail.com\n"
    "living.org\n"
    "item.co\n"
    "2.321\n"
    "4S.123")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Demo 1

Demo 2

测试

相关问题更多 >

编程相关推荐

热门问题

热门文章