正则表达式中的lookahead和lookahead

string = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"

3条回答

网友

1楼 · 编辑于 2024-09-29 22:34:50

在一个或多个空白字符上拆分单词可能是最好的方法：

import re

s = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"

words = re.split(r'\s+', s)
try:
    index = words.index('experience')
except Exception:
    pass
else:
    start = max(index - 5, 0)
    end = min(index + 6, len(words))
    print(' '.join(words[start:end]))

印刷品：

-MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine

但如果您坚持使用正则表达式，则应在“经验”之前打印最多5个单词，在“经验”之后打印最多5个单词：

import re

s = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"

m = re.search(r'([\w,;!.+-]+\s+){0,5}experience(\s+[\w,;!.+-]+){0,5}', s)
if m:
    print(m[0])

印刷品：

-MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine

更新以处理“体验”或“体验”

我还简化了正则表达式：

import re

s = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on Experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"

# By splitting on one or more whitespace characters:
words = re.split(r'\s+', s)
try:
    index = words.index('experience')
except Exception:
    try:
        index = words.index('Experience')
    except Exception:
        index = None
if index:
    start = max(index - 5, 0)
    end = min(index + 6, len(words))
    print(' '.join(words[start:end]))


# Using a regular expression:
m = re.search(r'(\S+\s+){0,5}[eE]xperience(\s+\S+){0,5}', s)
if m:
    print(m[0])

印刷品：

-MumbaiJob Role -Min 7years hands-on Experience in Natural Language Processing, Machine
-MumbaiJob Role -Min 7years hands-on Experience in Natural Language Processing, Machine

网友

2楼 · 编辑于 2024-09-29 22:34:50

请尝试下面的正则表达式

((?:\S+\s){10})(experience)((?:\s\S+){10})

这里\1前面有10个单词\3后面有10个单词在“经验”之后

Demo

网友

3楼 · 编辑于 2024-09-29 22:34:50

您可以先用空格分隔单词，然后从列表的前10个单词中选择，直到列表的末尾，最后将此列表分组以重做字符串

 ts=string.split(' ')[10:]
 print(" ".join(ts))

相关问题更多 >

编程相关推荐

热门问题

热门文章