正则表达式：使用Python查找字符串中的名称

2条回答

网友

1楼 · 编辑于 2024-10-03 02:41:14

如果您想用正则表达式（以及关于该主题的所有免责声明），下面的正则表达式可以处理字符串。但是，请注意，您需要从捕获组1检索匹配项。在online demo中，确保查看右下窗格中的组1捕获。：）

<[^<]*</[^>]*>|<.*?>|((?<=,\s)\w[\w ]*\w|\w[\w ]*\w(?=,))

基本上，使用左边的交替（用|）来匹配我们不想要的所有内容，然后右边的最后一个括号捕捉我们想要的。在

这是关于matching a pattern except in certain situations这个问题的一个应用程序（有关实现细节，包括到Python代码的链接，请阅读该应用程序）。在

网友

2楼 · 编辑于 2024-10-03 02:41:14

另一种方法是使用HTML解析器解析字符串，比如^{}。在

例如，通过检查preceding和following兄弟姐妹，可以使用xpath查找b标记和{}标记之间的所有内容：

from lxml.html import fromstring

l = [
    """<b>Carson Daly</b>: <a href="http://rads.stackoverflow.com/amzn/click/B009DA74O8">Ben Schwartz</a>, Soko, Jacob Escobedo (R 2/28/14)<br>'""",
    """<b>Carson Daly</b>: Wil Wheaton, the Birds of Satan, Courtney Kemp Agboh<br>"""
]

for html in l:
    tree = fromstring(html)
    results = ''
    for element in tree.xpath('//node()[preceding-sibling::b="Carson Daly" and following-sibling::br]'):
        if not isinstance(element, str):
            results += element.text.strip()
        else:
            text = element.strip(':')
            if text:
                results += text.strip()

    print results.split(', ')

它打印：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

正则表达式：使用Python查找字符串中的名称

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >