无法使用regex解析某些内容

2024-09-28 01:25:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我尝试使用re模块和python来创建一个脚本,从一个中间有换行符的长字符串中解析addressphoneemail。里面有两套集装箱。当我运行脚本时,它会给出第一个容器的结果,更不用说其中不需要的部分了。我不知道我在下面尝试的方式是否有任何有效的尝试!!任何帮助都将不胜感激。你知道吗

我试过:

import re

rstr = """
    Address The Westshore Grand,
    A Tribute Portfolio Hotel, Tampa

    Telephone 52 70 90 00
    E-mail info.suchona@gmail.com


    Address hotels near 1255 north palm ave 
    sarasota florida

    Telephone 62 40 80 00
    E-mail info.niit@hotmail.com
"""
address = re.findall(r'(Address.+)',rstr)[0].strip()
phone = re.findall(r'(Telephone.+)',rstr)[0].strip()
email = re.findall(r'(E-mail.+)',rstr)[0].strip()
print(f'{address}\n{phone}\n{email}')

结果是:

Address The Westshore Grand,
Telephone 52 70 90 00
E-mail info.suchona@gmail.com

我想要的是:

The Westshore Grand, A Tribute Portfolio Hotel, Tampa
52 70 90 00
info.suchona@gmail.com

hotels near 1255 north palm ave sarasota florida
62 40 80 00
info.niit@hotmail.com

虽然我知道可以通过字符串操作来实现,但我还是喜欢遵循regex的方式。谢谢。你知道吗


Tags: thereinfocomaddressemailphonemail
3条回答

试试这个正则表达式来获取你的地址。你知道吗

address = re.findall(r'(?<=Address).*?(?=Telephone)',rstr, flags=re.DOTALL)

演示:

address = re.findall(r'(?<=Address).*?(?=Telephone)',rstr, flags=re.DOTALL)
phone = re.findall(r'(Telephone.+)',rstr)
email = re.findall(r'(E-mail.+)',rstr)
for i in zip(address, phone, email):
    print('{address}\n{phone}\n{email}'.format(address=i[0].strip(), phone=i[1].strip(), email=i[2].strip()))
    print( "  -" )

输出:

The Westshore Grand,
    A Tribute Portfolio Hotel, Tampa
Telephone 52 70 90 00
E-mail info.suchona@gmail.com
  -
hotels near 1255 north palm ave 
    sarasota florida
Telephone 62 40 80 00
E-mail info.niit@hotmail.com
  -

你需要让RegEx捕获组只围绕你想要的东西。并且re.findall()返回匹配的RegEx模式的所有出现,因此您可以像这样简单地循环遍历它们(假设这三个信息始终存在):

address = re.findall(r'Address(.+?)\n\n', rstr, flags=re.S)
phone = re.findall(r'Telephone(.+)', rstr)
email = re.findall(r'E-mail(.+)', rstr)

for i in range(len(address)):
    print('\n'.join([
        re.sub('\s{2,}', ' ', address[i].strip()),
        phone[i].strip(),
        email[i].strip()
    ]))

输出:

The Westshore Grand, A Tribute Portfolio Hotel, Tampa
52 70 90 00
info.suchona@gmail.com

hotels near 1255 north palm ave sarasota florida
62 40 80 00
info.niit@hotmail.com
  • 要匹配换行符:使用re.DOTALL

  • 你还想抓住addresstelephone之间的所有东西,但要不贪婪.+?

  • 此外,您希望将其存储为一个组,因此请使用()

  • 用一个空格替换所有空格:re.sub

结果呢

addresses = [re.sub(r'\s+', r' ', addr) 
             for addr in re.findall(r'Address (.+?)Telephone', rstr, re.DOTALL)]

输出

['The Westshore Grand, A Tribute Portfolio Hotel, Tampa',
 'hotels near 1255 north palm ave sarasota florida']

也做

phones = re.findall(r'Telephone\s*(.+)\s*', rstr)
emails = re.findall(r'E-mail\s*(.+)\s*', rstr)

然后你可以在它们上面循环:

for addr, phone, email in zip(addresses, phones, emails):
    print(addr, phone, email, sep='\n', end='\n\n')

输出

The Westshore Grand, A Tribute Portfolio Hotel, Tampa 
52 70 90 00
info.suchona@gmail.com

hotels near 1255 north palm ave sarasota florida 
62 40 80 00
info.niit@hotmail.com

相关问题 更多 >

    热门问题