Pandas从列值中提取子级别，并在其他列中填充子级别值问题的回答

Pandas从列值中提取子级别，并在其他列中填充子级别值

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

假设此数据帧： <pre><code>vals = [['1.00.00.00', 'Total Assets', 1000], ['1.01.00.00', 'Cash', 200], ['1.02.00.00', 'Inventory', 800], ['1.02.01.00', 'Goods in process', 300], ['1.02.02.00', 'Goods for sale', 500], ['2.00.00.00', 'Liabilities', 750], ['2.01.00.00', 'Commercial Liabilities', 700], ['2.02.00.00', 'Other liabilities', 50], ['3.00.00.00', 'Net equity', 250]] cols = ['account', 'name', 'balance'] df = pd.DataFrame(valores, columns=cols) df </code></pre> <a href="https://i.stack.imgur.com/M6oFy.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/M6oFy.png" alt="enter image description here"/></a> 因此，来自上一级的每个帐户都将来自下一级的帐户相加 我想创建包含每个级别的帐户组名称的列，该列获取帐户的级别以及该帐户组的最终级别。为清楚起见，我希望得到以下输出： <a href="https://i.stack.imgur.com/RE6YD.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/RE6YD.png" alt="enter image description here"/></a> 我尝试了以下方法，但当我需要以某种方式读取前几行的值时，我被卡住了，并且在其中一个步骤（<code>FutureWarning: Columnar iteration over characters will be deprecated in future releases.</code>）中出现了FutureWarning错误 <pre><code>df['sub_account_1'], df['sub_account_2'], df['sub_account_3'], df['sub_account_4']\ = df['account'].str.split('.').str df['clasif_1'], df['clasif_2'], df['clasif_3'], df['clasif_4'] = ("", "", "", "") def account_level(df): if df['sub_account_2'] == '00': return '2' elif df['sub_account_3'] == '00': return '3' elif df['sub_account_4'] == '00': return '4' else: return np.nan df['nivel_cuenta'] = df.apply(account_level, axis = 1) </code></pre> 停在那里。我相信必须有一种更聪明的方法来解决这个问题，也许一种已知的方法可以解决这个问题，因为这种情况并不罕见（账户和子账户级别） 真实案例场景最多有8个子级别，但帐户代码具有相同的结构。我将使用创建的列来控制原始数据集中给出的小计，然后进行一些分析 有人能给我指出正确的方向吗？（我不想回到Excel中来解决这个问题……）

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

我终于能够用下面的代码做我需要的事情了。我主要使用一些python进行循环，但我仍然觉得必须有一种更“熊猫”的方式来实现这一点。基于其他stackoverflow问题，我尝试了几种方法，但在其他方面很难做到这一点。我将把代码留在这里，但我们非常感谢您的反馈，欢迎您提出更多想法 <pre><code>vals = [['1.00.00.00', 'Total Assets', 1800], ['1.01.00.00', 'Cash', 1000], ['1.02.00.00', 'Inventory', 900], ['1.02.01.00', 'Goods in process', 300], ['1.02.02.00', 'Goods for sale', 500], ['1.02.03.00', 'Goods other', 100], ['2.00.00.00', 'Liabilities', 900], ['2.01.00.00', 'Commercial Liabilities', 850], ['2.02.00.00', 'Other liabilities', 50], ['2.03.00.00', 'More liabilities', 20], ['2.03.01.00', 'One liability', 20], ['2.03.01.01', 'One-One liability', 20], ['2.03.01.02', 'One-Two liability', 20], ['2.03.02.00', 'Two liability', 20], ['2.04.00.00', 'One More liability', 20], ['3.00.00.00', 'Net equity', 250]] cols = ['account', 'name', 'balance'] df = pd.DataFrame(vals, columns=cols) level_cols = ['level_1', 'level_2', 'level_3', 'level_4',] df[level_cols] = df['account'].str.split('.',expand=True) df['account_level'] = "" df['imputable'] = "" clasif_cols = ['clasif_1', 'clasif_2', 'clasif_3', 'clasif_4'] for col in clasif_cols: df[col] = "" for i in range(len(df)): row = df.loc[i] account_level = 0 if i == 0: account_level += 1 df.loc[i, 'clasif_1'] = row['name'] else: for level in range(4): level_col = "level_"+str(level+1) clasif_col = "clasif_"+str(level+1) if row[level_col] == '00': break else: account_level += 1 prev_row = df.loc[i-1] # If row level is the same as previous value, repeat previous clasification. # Else, assign name to this clasification if row[level_col] == prev_row[level_col]: df.loc[i, clasif_col] = prev_row[clasif_col] else: df.loc[i, clasif_col] = row['name'] df.loc[i, 'account_level'] = account_level for i in range(len(df)): row = df.loc[i] # If this is the last row, it is an imputable account for sure. Also needed it to avoid an error at last row. if i == (len(df)-1): df.loc[i, 'imputable'] = "YES" # else, if the level of this row is equal or higher than the level of the next row, this is an imputable account else: next_row = df.loc[i+1] if row['account_level'] >= next_row['account_level']: df.loc[i, 'imputable'] = "YES" df.drop(level_cols, axis=1, inplace=True) df </code></pre> 输出： <a href="https://i.stack.imgur.com/G8WTl.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/G8WTl.png" alt="enter image description here"/></a>

Pandas从列值中提取子级别，并在其他列中填充子级别值

1 个回答

相关Python问题