Pandas从列值中提取子级别，并在其他列中填充子级别值

2条回答

网友

1楼 · 编辑于 2024-09-22 20:30:09

您可以循环数据帧行，并应用sub_account_x==0上的条件来填充所需的列

# initialize the sub_account columns and clasif columns
sub_account_cols = ['sub_account_1', 'sub_account_2', 'sub_account_3', 'sub_account_4']
clasif_cols = ['clasif_1', 'clasif_2', 'clasif_3', 'clasif_4']

# assign default values for new columns 
df['sub_account_1'] = df['sub_account_2'] = df['sub_account_3'] = df['sub_account_4'] = ''
df['clasif_1'] = df['clasif_2'] = df['clasif_3'] = df['clasif_4'] = np.NaN
df['level'] = 0


# iterate over the dataframe rows
for idx, row in df.iterrows():
    # check which sub_account part is 00 and assign that as the level and the clasif
    sub_accounts = row['account'].split('.')
    df.loc[idx, sub_account_cols] = sub_accounts # we really don't need to assign to sub_account_x columns here, if you dont want to populate the parent clasif columns with previous value as I have done in the groupby below
    for i, sub_account in enumerate(sub_accounts):
        if sub_account=='00':
            df.loc[idx, 'level'] = i
            df.loc[idx, 'clasif_' + str(i)] = row['name']
            break

# fill clasif for parent rows
for i, clasif_col in enumerate(clasif_cols):
    df[clasif_col] = df.groupby('sub_account_'+str(i+1))[clasif_col].apply(lambda x: x.fillna(method='ffill')).fillna('')

# Fill last_level = True if the level of current row is greater than level of next row
df['last_level'] = np.where(df['level']>=df['level'].shift(-1), True, False)
# The last row is always the last level
df.loc[len(df)-1, 'last_level'] = True

# drop unwanted columns
df.drop(sub_account_cols, axis=1, inplace=True)
print(df)

结果:

      account                    name  balance      clasif_1                 clasif_2          clasif_3 clasif_4  level  last_level  
0  1.00.00.00            Total Assets     1000  Total Assets                                                          1       False  
1  1.01.00.00                    Cash      200  Total Assets                     Cash                                 2        True  
2  1.02.00.00               Inventory      800  Total Assets                Inventory                                 2       False  
3  1.02.01.00        Goods in process      300  Total Assets                Inventory  Goods in process               3        True  
4  1.02.02.00          Goods for sale      500  Total Assets                Inventory    Goods for sale               3        True  
5  2.00.00.00             Liabilities      750   Liabilities                                                          1       False  
6  2.01.00.00  Commercial Liabilities      700   Liabilities   Commercial Liabilities                                 2        True  
7  2.02.00.00       Other liabilities       50   Liabilities        Other liabilities                                 2        True  
8  3.00.00.00              Net equity      250    Net equity                                                          1        True

网友

2楼 · 编辑于 2024-09-22 20:30:09

我终于能够用下面的代码做我需要的事情了。我主要使用一些python进行循环，但我仍然觉得必须有一种更“熊猫”的方式来实现这一点。基于其他stackoverflow问题，我尝试了几种方法，但在其他方面很难做到这一点。我将把代码留在这里，但我们非常感谢您的反馈，欢迎您提出更多想法

vals = [['1.00.00.00', 'Total Assets', 1800], ['1.01.00.00', 'Cash', 1000], ['1.02.00.00', 'Inventory', 900], ['1.02.01.00', 'Goods in process', 300], ['1.02.02.00', 'Goods for sale', 500], ['1.02.03.00', 'Goods other', 100], ['2.00.00.00', 'Liabilities', 900], ['2.01.00.00', 'Commercial Liabilities', 850], ['2.02.00.00', 'Other liabilities', 50], ['2.03.00.00', 'More liabilities', 20], ['2.03.01.00', 'One liability', 20], ['2.03.01.01', 'One-One liability', 20], ['2.03.01.02', 'One-Two liability', 20], ['2.03.02.00', 'Two liability', 20], ['2.04.00.00', 'One More liability', 20], ['3.00.00.00', 'Net equity', 250]]

cols = ['account', 'name', 'balance']

df = pd.DataFrame(vals, columns=cols)

level_cols = ['level_1', 'level_2', 'level_3', 'level_4',]
df[level_cols] = df['account'].str.split('.',expand=True)

df['account_level'] = ""
df['imputable'] = ""

clasif_cols = ['clasif_1', 'clasif_2', 'clasif_3', 'clasif_4']

for col in clasif_cols:
    df[col] = ""

for i in range(len(df)):
    
    row = df.loc[i]
    account_level = 0
    
    if i == 0:
        account_level += 1
        df.loc[i, 'clasif_1'] = row['name']
    
    else:
        for level in range(4):
            
            level_col = "level_"+str(level+1)
            clasif_col = "clasif_"+str(level+1)

            if row[level_col] == '00':
                break
            else:
                account_level += 1
                prev_row = df.loc[i-1]
                
                # If row level is the same as previous value, repeat previous clasification. 
                # Else, assign name to this clasification
                if row[level_col] == prev_row[level_col]: 
                    df.loc[i, clasif_col] = prev_row[clasif_col]
                else: 
                    df.loc[i, clasif_col] = row['name']
                
    df.loc[i, 'account_level'] = account_level

for i in range(len(df)):
    row = df.loc[i]
    # If this is the last row, it is an imputable account for sure. Also needed it to avoid an error at last row.

    if i == (len(df)-1):
        df.loc[i, 'imputable'] = "YES"
    # else, if the level of this row is equal or higher than the level of the next row, this is an imputable account
    else:
        next_row = df.loc[i+1]
        if row['account_level'] >= next_row['account_level']:
            df.loc[i, 'imputable'] = "YES"

df.drop(level_cols, axis=1, inplace=True)

df

输出：

相关问题更多 >

编程相关推荐

热门问题

热门文章

Pandas从列值中提取子级别，并在其他列中填充子级别值

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >