仅当元素不同时,才使用单元格中的列表将多行合并为一行

2024-10-03 13:30:13 发布

您现在位置:Python中文网/ 问答频道 /正文

我有下表:

source_system   geo_id  product_subfamily   product_deny_list   product_allow_list  transaction_deny_list   operation_allow_list    operation_filter
0   CONFIRMING_SCHF FRK CASH_MGMT   ' ' 'CNF'   ' ' ' ' NaN
1   EQUATION_SCHF   FRK CASH_MGMT   'CD','TEST','CB'    'CA'    '408','805','385','856','320','420','825','355...   ' ' NaN

我想将其转换为一行的新表:

source_system   geo_id  product_subfamily   product_deny_list   product_allow_list  transaction_deny_list   operation_allow_list    operation_filter
0   [CONFIRMING_SCHF, EQUATION_SCHF]    FRK CASH_MGMT   ['CD','TEST','CB']  ['CNF', 'CA']   ' ' ' ' NaN

在转换过程中,应在每个单元格中创建列表,但多行之间的元素中只有一个不同,如果它们相同,则只应保留单个值。如果一行中有一个空字符串,而另一行中有一个与空白字符串不同的值,则应该删除空白字符串,并保留另一个值。

我怎么能这样做

提前谢谢


Tags: 字符串idsourcecashnanproductoperationsystem
2条回答
import pandas as pd
import numpy as np
def apply_func(x):
    x = list(filter(None, set(x))) # Filter blank spaces
    if len(x) <= 1:
        return ''.join(x)
    return x

df = pd.DataFrame((['CONFIRMING_SCHF','FRK','CASH_MGMT', '', 'CNF' ,'','', np.nan],
                   ['EQUATION_SCHF','FRK', 'CASH_MGMT',   "'CD','TEST','CB'",'CA' ,  '408355','',np.nan]),
                  columns = ['source_system','geo_id','product_subfamily','product_deny_list','product_allow_list','transaction_deny_list','operation_allow_list','operation_filter'])
df['UNIQUE'] = 1
df_list = df.groupby('UNIQUE').agg(apply_func) #You can apply reset_index(drop=True) as well
df_list

这可能不是适合您的解决方案,因为我添加了名为UNIQUE的附加列,但我得到了您期望的输出,您可以在apply_func函数中应用几乎所有的条件

输出

source_system   geo_id  product_subfamily   product_deny_list   product_allow_list  transaction_deny_list   operation_allow_list    operation_filter
UNIQUE                              
1   [CONFIRMING_SCHF, EQUATION_SCHF]    FRK CASH_MGMT   'CD','TEST','CB'    [CNF, CA]   408355      [nan, nan]

data+解决方案的小示例:

d = {'source_system   ': ['CONFIRMING_SCHF ', 'EQUATION_SCHF'], 'geo_id': ['FRK', 'FRK']}
df = pd.DataFrame(data=d)
df_list = df.apply(lambda x: list(set(x)))
df = pd.DataFrame(data=df_list).T

结果: Result:

相关问题 更多 >