如果包含在括号中，则替换列的字符串值

test = pd.DataFrame({'type':['fruit-of the-loom (sometimes-never)', 'yes', 'ok (not-possible) I will try', 'vegetable', 'poultry', 'poultry'], 'item':['apple', 'orange', 'spinach', 'potato', 'chicken', 'turkey']})

3条回答

网友

1楼 · 编辑于 2024-09-26 17:43:55

我应该多花点时间考虑这个问题

这就是我想出的解决办法”

计数括号，替换正确计数范围内的内容

def inside_parens(string):
    parens_count = 0
    return_string = ""
    for a in string:
        if a == "(":
            parens_count += 1
        elif a == ")":
            parens_count -= 1
        if parens_count > 0:
            return_string += a.replace('-', ' ')
        else:
            return_string += a
    return return_string


    return return_string

完成后，将其应用于预期列：

df['col_1'] = df['col_1'].apply(inside_parens)

如果你想推广这个函数，你实际上可以传递你想要替换的东西，使它更通用

网友

2楼 · 编辑于 2024-09-26 17:43:55

一种方法是将^{}与查找括号之间内容的模式一起使用，替换参数可以是在匹配对象上使用replace的lambda：

print (test['type'].str.replace(pat='\((.*?)\)', 
                                repl=lambda x: x.group(0).replace('-',' ')))
0    fruit-of the-loom (sometimes never)
1                                    yes
2           ok (not possible) I will try
3                              vegetable
4                                poultry
5                                poultry
Name: type, dtype: object

有关pat=中的内容的解释，请参见here

网友

3楼 · 编辑于 2024-09-26 17:43:55

test.type = (test.type.str.extract('(.*?\(.*?)-(.*?\))(.*)')
             .sum(1)
             .combine_first(test.type))

说明：

提取beginning until parenthesis and then hyphen和after hyphen until parenthesis and then optional additional stuff的正则表达式组
用sum再次将它们连接在一起
其中，NaN，使用原始（combine_first）中的值

这样，连字符将被删除，而不是替换为空格。如果需要空格，可以使用apply而不是sum：

test.type = (test.type.str.extract('(.*?\(.*?)-(.*?\))(.*)')
             .apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
             .combine_first(test.type))

无论哪种方式，这都不适用于多组括号

相关问题更多 >

编程相关推荐

热门问题

热门文章