如何合并多条记录中拆分的文本行

2024-06-30 13:57:00 发布

您现在位置:Python中文网/ 问答频道 /正文

嗨,我想合并python数据帧中的记录

当前数据帧

Date        Value Date  Description     Amount
01/07/2019  01/07/2019  CHEQUE WITHDRAW     1000.00
01/07/2019  01/07/2019  SUNDRY CREDIT CAPITAL FUND FEES     100.00

02/07/2019  02/07/2019  CHEQUE WITHDRAW     10.00   
02/07/2019  02/07/2019  SUNDRY CREDIT FROM HEAD OFFICE      10.00                           

02/07/2019  02/07/2019  CHEQUE WITHDRAW     50.00   

Expected dataframe
Date        Value Date  Description                  Amount
01/07/2019  01/07/2019  CHEQUE WITHDRAW                      1000.00
01/07/2019  01/07/2019  SUNDRY CREDIT CAPITAL FUND FEES      100.00
02/07/2019  02/07/2019  CHEQUE WITHDRAW              10.00  
02/07/2019  02/07/2019  SUNDRY CREDIT FROM HEAD OFFICE       10.00                          
02/07/2019  02/07/2019  CHEQUE WITHDRAW                  50.00  

获取错误密钥错误:26

我试着遍历行,找到amount列null并与description合并,然后删除该行

for index, row in df.iterrows():
  if (pd.isnull(row[3]) == True):
    df.loc[index-1][2] = str(df.loc[index-1][2]) + ' ' + str(df.loc[index][0]) 
    df.drop([index],inplace=True)

Tags: 数据dfdateindexvaluedescriptionamountloc
1条回答
网友
1楼 · 发布于 2024-06-30 13:57:00

您可以尝试以下方法(在本文末尾,您可以找到我的测试数据):

# create a new aux column "Description new" that will be filled with the
# new description
df['Description new']= df['Description']

# create an auxillary data frame copy that will be shifted
# to match the wrapped lines and add another aux column
# that just contains the wrapped and not yet added segments
df_shifted= pd.DataFrame(df, copy=True)
df_shifted['Continued Description']= df_shifted['Description'].where(df_shifted['Date'].isna(), None)

# it seems you have just up to 2 line breaks, so we would have to 
# do it just 2 times
for i in range(3):
    # shift the aux df to get the wrapped descriptions in the same line
    df_shifted= df_shifted.shift(-1)
    # concatenate them
    df['Description new']= df['Description new'].str.cat(df_shifted['Continued Description'].fillna(''), sep=' ').str.strip(' ')
    # delete the added parts from Continued Description in order
    # not to add them to the previous transaction's description
    df_shifted.loc[~df['Date'].isna(), 'Continued Description']= None

df.loc[~df['Date'].isna(), 'Description new']

这会返回如下结果:

0                  CHEQUE WITHDRAW   
1    SUNDRY CREDIT CAPITAL FUND FEES 
4                  CHEQUE WITHDRAW   
5    SUNDRY CREDIT FROM HEAD OFFICE  
7                  CHEQUE WITHDRAW   
Name: Description new, dtype: object

您可以使用以下代码生成的数据来测试:

import io
csv="""
Date;Value Date;Description;Amount
01/07/2019;01/07/2019;CHEQUE WITHDRAW;1000.00
01/07/2019;01/07/2019;SUNDRY CREDIT;100.00
;;CAPITAL FUND;
;;FEES;
02/07/2019;02/07/2019;CHEQUE WITHDRAW;10.00
02/07/2019;02/07/2019;SUNDRY CREDIT;10.00
;;FROM HEAD OFFICE;
02/07/2019;02/07/2019;CHEQUE WITHDRAW;50.00
"""

df=pd.read_csv(io.StringIO(csv), sep=';')

相关问题 更多 >