用python处理RegEx

punct = ['.', ',', ':', ';', '!', '[', ']', '(', ')', '{', '}'] def split_punctuation(sentence)-> list: sentwords = sentence.split(" ") for i, word in enumerate(sentwords): if word_ends_with_punct(word) and len(word) > 1: sentwords.pop(i) sentwords.insert(i, word[:-1]) sentwords.insert(i+1, word[-1]) word = word[:-1] if word_starts_with_punct(word) and len(word) > 1: sentwords.pop(i) sentwords.insert(i, word[0:1]) sentwords.insert(i+1, word[1:]) word = word[1:] return sentwords def word_starts_with_punct(w)-> bool: for p in punct: if w.startswith(p): return True return False def word_ends_with_punct(w)->bool: for p in punct: if w.endswith(p): return True return False

def sep_punct_by_regex(sent)->list : words = sent.split(" ") new_words = [] for w in words: tmp1 = re.sub(r'^[]!"$/%&\'()*+,.:;=#@?[\\^_`{|}~-]+', r' \g<0> ', w).strip() tmp2 = re.sub(r'[]!"$/%&\'()*+,.:;=#@?[\\^_`{|}~-]+$', r' \g<0> ', tmp1).strip() t = tmp2.split(" ") for x in t: new_words.append(x) return new_words

1条回答

网友

1楼 · 发布于 2024-09-30 02:26:35

你可以用

re.findall(r'\b(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}\b|[^\W_]+|(?:[^\w\s]|_)+', s)

参见regex demo

要删除字符串两端的标点符号并从空白中删除，请使用

re.sub(r'^[\W_]+|[\W_]+$', '', s).strip()

所以，看起来

s = re.sub(r'^[\W_]+|[\W_]+$', '', s).strip()
oct = r'(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])'
return re.findall(r'\b{0}(?:\.{0}){{3}}\b|[^\W_]+|(?:[^\w\s]|_)+'.format(oct), s)

细节

\b(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}\b-anIPv4 regex pattern
|-或
[^\W_]+-一个或多个字母或数字
|-或
(?:[^\w\s]|_)+-除单词和空格字符或_以外的一个或多个字符。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章