基于多个分隔符拆分但保留连续字符串(python)

2024-05-19 15:40:32 发布

您现在位置:Python中文网/ 问答频道 /正文

在我正在使用的数据集中,化妆短语(即粉底、唇膏/唇膏、遮瑕膏、青铜等)与前一个短语捆绑在一起(见下面的例子)。如何拆分/划分整个数据集中的捆绑短语,同时保留它们

示例短语

‘vamplipstick @’

‘208bronzer :’

‘jadefoundation :’

‘nc45blush @’

‘eyeseyeliner @’

‘kikomilanolips :’

‘235concealer @’

理想输出

‘vamp lipstick @’

‘208 bronzer:’

‘jade foundation:’

‘nc45 blush @’

‘eyes eyeliner @’

‘kikomilano lips:’

‘235 concealer @’

到目前为止的代码

makeup = r"\w+\s+[@:]"
separators = ["foundation", "bronzer", "lips", "lipstick", "concealer", "blush", "eyeliner"]
[makeup.partition(<?list_multiple_separators?>) for makeup in df]

Tags: 数据示例例子foundation化妆makeuplipsseparators
3条回答

尝试使用正则表达式替换

import re

data = """
‘vamplipstick @’

‘208bronzer :’

‘jadefoundation :’

‘nc45blush @’

‘eyeseyeliner @’

‘kikomilanolips :’

‘235concealer @’
"""
separators = [
    "foundation", "bronzer", "lips",
    "lipstick", "concealer", "blush", "eyeliner"
]
output = re.sub(r"({seps})".format(seps='|'.join(separators)), r' \1', data)

输出:

‘vamp lipstick @’

‘208 bronzer :’

‘jade foundation :’

‘nc45 blush @’

‘eyes eyeliner @’

‘kikomilano lips :’

‘235 concealer @’

这将替换每个术语本身,并在其前面添加一个空白字符

import re

l = ["vamplipstick @", "208bronzer :", "jadefoundation :", "nc45blush @",
     "eyeseyeliner @", "kikomilanolips :", "235concealer @"]

[print(re.sub(
   r"^(\w+)\s*(foundation|bronzer|lipstick|lips|concealer|blush|eyeliner)\s*( @|:)$", 
   r"\1 \2\3", i)) for i in l]

vamp lipstick @

208 bronzer:

jade foundation:

nc45 blush @

eyes eyeliner @

kikomilano lips:

235 concealer @


测试此代码here

可以使用re.sub将下列正则表达式的匹配项替换为空格

r'(?=(?:foundation|lips|lipstick|concealer|bronzer) )'

Regex demo<“”\_(ツ)_/'>;Python demo

Python的正则表达式引擎以正向前瞻的方式匹配指定的字符串之一,后跟空格。包含空格是为了避免匹配,例如,"lips""lipstick"in007lipsticked :"(也可以使用单词边界\b

注意,匹配是一个空字符串(即零宽度匹配)。在"jadefoundation :"中,匹配可以被认为是"e""f"之间的空字符串

“所需输出”显示已删除":"之前的空间。由于"@"之前的空间未删除,我假设删除"@"之前的空间是无意的,但如果我错了,请纠正我

相关问题 更多 >