基于条件合并行

2024-06-26 01:46:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我有过这样的经历

   Date    Description  Debit     Credit     Balance originalIdx
0   01-03-19    AAAA        NaN        NaN        49Cr       0
1   01-03-19    ASSS        NaN      6,000.00     55Cr       1
2   NaN         XYZ         ABC        saa        NaN        1
3   01-03-19    ABZ        289.00      NaN        55Cr       3

我想要这个

      Date     Description       Debit        Credit     Balance  originalIdx
0   01-03-19    AAAA              NaN          NaN        49Cr        0
1   01-03-19    ASSSXYZABCsaa     NaN        6,000.00     55Cr        1
3   01-03-19    ABZ              289.00         NaN       55Cr        3

如果originalIdx相同,我想合并该行,所以在Description列中合并该行

这是我的实时数据 enter image description here


Tags: datedescriptionnan经历abcbalancedebitcredit
2条回答

假设Date将有NaN,如果需要合并行,则代码如下

首先创建一个伪列merged。它将合并DescriptionDebitCredit的所有值。仅当值为alpha(不包括数值)时,它才会合并

然后使用groupby transform(lambda)函数替换Description

如果行的Date为NaN,则删除NaN。同时删除临时列merged

df['merged'] = df[['Description','Debit','Credit']].apply(lambda x: ''.join([str(a) for a in x if pd.notnull(a) and not isinstance(a, float)]) ,axis=1)

df['Description'] = df.groupby("originalIdx")['merged'].transform(lambda x: "".join(x))
df.dropna(subset=['Date'],inplace=True)
df.drop(columns='merged',inplace=True)
print (df)

这将为您提供:

       Date    Description  Debit  Credit Balance  originalIdx
0  01-03-19           AAAA    NaN     NaN    49Cr            0
1  01-03-19  ASSSXYZABCsaa    NaN  6000.0    55Cr            1
3  01-03-19            ABZ  289.0     NaN    55Cr            3

下面是完整的代码,包括数据和输出

df['merged']替换为以下代码:

df['merged'] = df[['Description','Debit','Credit']].apply(lambda x: ''.join([str(a) for a in x if pd.notnull(a) and not isinstance(a, float)]) ,axis=1)

完整代码为:

import pandas as pd
import numpy as np
pd.set_option('display.max_columns', 200)
pd.set_option('display.max_colwidth', 250)
c = ['Date','Description','Debit','Credit','Balance','originalIdx']
d = [['01-03-19','FORTAP-MUMBAI/',np.NaN, np.NaN, '49656.25Cr',0],
    ['01-03-19','FORTAP-MUMBAI/******',np.NaN,6000.00,'55656.25Cr',1],
    [np.NaN,'UP/*ABC*/*DEF*','UPI/*PQR*/*XYZ*','paytm/NA',np.NaN,1],
['01-03-19','MBK/*ABCDEF*/*ZZZ*',289.00,np.NaN,'55357.25Cr',3]]
df = pd.DataFrame(d,columns=c)
print (df)
df['merged'] = df[['Description','Debit','Credit']].apply(lambda x: ''.join([str(a) for a in x if pd.notnull(a) and not isinstance(a, float)]) ,axis=1)
df['Description'] = df.groupby("originalIdx")['merged'].transform(lambda x: "".join(x))
df.dropna(subset=['Date'],inplace=True)
df.drop(columns='merged',inplace=True)
print (df)

附加输出前后:

之前:

       Date           Description            Debit    Credit     Balance  originalIdx  
0  01-03-19        FORTAP-MUMBAI/              NaN       NaN  49656.25Cr             0  
1  01-03-19  FORTAP-MUMBAI/******              NaN    6000.0  55656.25Cr             1  
2       NaN        UP/*ABC*/*DEF*  UPI/*PQR*/*XYZ*  paytm/NA         NaN             1  
3  01-03-19    MBK/*ABCDEF*/*ZZZ*            289.0       NaN  55357.25Cr             3  

之后:

       Date                                                Description  Debit  Credit     Balance  originalIdx  
0  01-03-19                                             FORTAP-MUMBAI/    NaN     NaN   49656.25Cr           0  
1  01-03-19  FORTAP-MUMBAI/******UP/*ABC*/*DEF*UPI/*PQR*/*XYZ*paytm/NA    NaN  6000.0   55656.25Cr           1  
3  01-03-19                                         MBK/*ABCDEF*/*ZZZ*  289.0     NaN   55357.25Cr           3  

我知道你在用熊猫

df.fillna("").groupby("originalIdx").agg(lambda x: "".join(x))

用这条线你可以很容易地做到

我们对列进行分组,并使用agg执行lambda函数,将输出连接起来

编辑:

为了控制数据None,我们使用.fillna ("").将None替换为空字符串

相关问题 更多 >