如果字符串包含重复的单词,则仅保留第一个单词

2024-09-30 18:20:27 发布

您现在位置:Python中文网/ 问答频道 /正文

customer_name                                               ANDY
number_of_product_variants                                      2
number_of_channels                                              1
number_of_discount_codes                                        1
order_count                                                     1
order_name                                            #1100,#1100
discount_code                        Christmas2020, Christmas2020
channel                                      Instagram, Instagram
product_variant                    Avengers Set A, Avengers Set B

仅当字符串包含重复项时,我想删除重复的单词

预期产出:

customer_name                                                ANDY
number_of_product_variants                                      2
number_of_channels                                              1
number_of_discount_codes                                        1
order_count                                                     1
order_name                                                  #1100
discount_code                                       Christmas2020
channel                                                 Instagram
product_variant                    Avengers Set A, Avengers Set B

我尝试的代码:

def unique_string(l):
    ulist = []
    [ulist.append(x) for x in l if x not in ulist]
    return ulist

customer_df['channel_2']=customer_df['channel']
customer_df['channel_2'].apply(unique_string)

仅对channel列使用下面的代码返回:

0                                   [S, e, a, r, c, h, ,]
1                    [P, a, i, d,  , A, s, :, S, o, c, l]
2                 [P, a, i, d,  , A, s, :, S, o, c, l, ,]
3                                      [U, n, k, o, w, ,]
```

Tags: ofnamenumberdfchannelorderdiscountcustomer
2条回答

如果多个值的顺序不重要,则可以使用set将值按,分割

如果顺序很重要,请将dict与.keys()一起使用:

customer_df = pd.DataFrame({"channel_2":['Instagram, Instagram',
                                         'Instagram, Instagram1, Instagram, Instagram2']})
    
f1 = lambda x: ', '.join(set(y for y in x.split(', ')))
f2 = lambda x: ', '.join(dict.fromkeys(y for y in x.split(', ')).keys())

customer_df['channel_2_1'] = customer_df['channel_2'].apply(f1)
customer_df['channel_2_2'] = customer_df['channel_2'].apply(f2)
print (customer_df)
                                      channel_2  \
0                          Instagram, Instagram   
1  Instagram, Instagram1, Instagram, Instagram2   

                         channel_2_1                        channel_2_2  
0                          Instagram                          Instagram  
1  Instagram2, Instagram1, Instagram  Instagram, Instagram1, Instagram2  

您的数据框似乎包含表示列表而不是列表的字符串

例如:

'[ "Instagram", "Instagram" ]' and not ["Instagram", "Instagram"]

注意外部的单引号

您可以看到这一点,因为for construction似乎迭代字符串的字符,而不是列表的元素

要将列表的字符串表示形式转换为字符串,应首先使用:

import ast
customer_df["channel"] = customer_df["channel"].apply(ast.literal_eval) 

如果您想了解有关ast.literal_eval的更多信息,请参阅this问题

然后,您可以应用您的函数unique_字符串

相关问题 更多 >