使用正则表达式通过带前缀的逗号拆分字符串

2024-09-24 02:19:35 发布

您现在位置:Python中文网/ 问答频道 /正文

这是一个非常基本的问题,但出于某种原因,我正在努力构建正则表达式。 我有一组以X 开头的字符串(结尾是空格),然后是一组字符串(可以有多个单词),它们之间用逗号分隔,最后有一个点

示例:

X abc, abd.
X abc, abd, abcd.
X abc abd, abc.
X asdas, asdasd, adsasda, asdasda.
X asdas asdasda, asdasdas asdasda, asdasdasas, asdasddas.

我试图使用re模块获取逗号之间所有字符串的列表,因此我得到:

['abc', 'abd']
['abc', 'abd', 'abcd']
['abc abd', 'abc']
['asdas', 'asdasd', 'adsasda', 'asdasda']
['asdas asdasda', 'asdasdas asdasda', 'asdasdasas', 'asdasddas']

我试过:

match = re.search('X\s+((.*)\,)+(.*)\.', content.text)

但它似乎不起作用:

enter image description here

我可以在这里使用哪个正则表达式

请注意,字符串可以有数字和特殊字符(如:;()和其他字符)


Tags: 字符串re结尾空格abc逗号abcdasdasd
3条回答

假设我们可以将问题表述为希望找到任意一个或多个空格分隔的单词序列,我们可以尝试使用re.findall

inp = ["X abc, abd.", "X abc, abd, abcd.", "X abc abd, abc.", "X asdas, asdasd, adsasda, asdasda.", "X asdas asdasda, asdasdas asdasda, asdasdasas, asdasddas."]
for i in inp:
    matches = re.findall(r'(?<=.)\w+(?: \w+)*', i)
    print(matches)

这张照片是:

['abc', 'abd']
['abc', 'abd', 'abcd']
['abc abd', 'abc']
['asdas', 'asdasd', 'adsasda', 'asdasda']
['asdas asdasda', 'asdasdas asdasda', 'asdasdasas', 'asdasddas']

这是一种仅使用正则表达式实现所需的方法:

import re

lst = ['X abc, abd.',
       'X abc, abd, abcd.',
       'X abc abd, abc.',
       'X asdas, asdasd, adsasda, asdasda.',
       'X asdas asdasda, asdasdas asdasda, asdasdasas, asdasddas.']

[re.split(", ", re.search("X\s(.*)\.", i).group(1)) for i in lst]

enter image description here

此方法使用部分正则表达式:

import re

lst = ['X abc, abd.',
       'X abc, abd, abcd.',
       'X abc abd, abc.',
       'X asdas, asdasd, adsasda, asdasda.',
       'X asdas asdasda, asdasdas asdasda, asdasdasas, asdasddas.']

[[j.strip() for j in re.split(",", i.strip("X."))] for i in lst]

enter image description here

最简单的方法不是使用正则表达式,而是使用一个简单的python脚本:

strings = ["X abc, abd.", "X abc, abd, abcd.", "X abc abd, abc.", "X asdas, asdasd, adsasda, asdasda.", "X asdas asdasda, asdasdas asdasda, asdasdasas, asdasddas."]

def split_words(list_of_strings):
    words_per_string = []
    
    for idx, s in enumerate(list_of_strings):
        words_per_string.append([])
        # remove X and first whitespace
        s = s[2:]
        splitted = s.split(",")
        for words in splitted:
            words_per_string[idx].append(words.strip())
            
    return words_per_string

split_words(strings)

相关问题 更多 >