如何合并重复的行并用另一行的值填充NaN单元格?

2024-10-03 23:29:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧df:

df = pd.DataFrame(
{
    "type": ["E/2", "E/2", "E/2", "E/2"],
    "subtype1": ["N2", "N2", "N2", "N2"],
    "subtype2": ["a", "a", "b", np.nan],
    "subtype3": [np.nan, np.nan, np.nan, "xxx"],
    "flex_best": [20, np.nan, 20, np.nan],
    "flex_worst": [np.nan, 30, np.nan, 30],
    "lead_best": [23, np.nan, 23, np.nan],
    "is_best": [1, np.nan, 1, np.nan],
    "lead_worst": [np.nan, 33, np.nan, 33],
    "is_worst": [np.nan, 1, np.nan, 1],
}
)
df.head()

| type | subtype1 | subtype2 | subtype3 | flex_best | flex_worst | lead_best | is_best | lead_worst | is_worst|
|------|----------|----------|----------|-----------|------------|-----------|---------|------------|---------|
| E/2  | N2       | a        | NaN      | 20        | NaN        | 23        | 1       | NaN        | NaN     |
| E/2  | N2       | a        | NaN      | NaN       | 30         | NaN       | NaN     | 33         | 1       |
| E/2  | N2       | b        | NaN      | 20        | NaN        | 23        | 1       | NaN        | NaN     |
| E/3  | N2       | NaN      | xxx      | NaN       | 30         | NaN       | NaN     | 33         | 1       |

我想通过以下方式删除重复行并将它们合并在一起: [“类型”、“子类型1”、“子类型”、“子类型3”] ,在适当的地方填充NAN

因此:

  • “flex_最佳”+“flex_最差”
  • “最佳领先”+“最差领先”
  • “是最好的”+“是最差的”

这将导致此数据帧:

| type | subtype1 | subtype2 | subtype3 | flex_best | flex_worst | lead_best | is_best | lead_worst | is_worst|
|------|----------|----------|----------|-----------|------------|-----------|---------|------------|---------|
| E/2  | N2       | a        | NaN      | 20        | 30         | 23        | 1       | 33         | 1       |
| E/2  | N2       | b        | NaN      | 20        | NaN        | 23        | 1       | NaN        | NaN     |
| E/2  | N2       | NaN      | xxx      | NaN       | 30         | NaN       | NaN     | 33         | 1       |

我怎样才能对熊猫做到这一点


Tags: 类型dfistypenpnanflexbest
1条回答
网友
1楼 · 发布于 2024-10-03 23:29:43

为此,我们将在groupby中使用@cs95提供的^{}函数(这里的功劳归于@Divakar)。由于您的一些分组键包含NaN,我们需要将dropna=False添加到groupby调用中

然后将所有NaN(在非分组列的子集上)的行放在后面

import numpy as np
import pandas as pd

gp_cols = ['type', 'subtype1', 'subtype2', 'subtype3']
oth_cols = df.columns.difference(gp_cols)

arr = np.vstack(df.groupby(gp_cols, sort=False, dropna=False)
                  .apply(lambda gp: justify(gp.to_numpy(), invalid_val=np.NaN, 
                                            axis=0, side='up')))

# Reconstruct DataFrame
# Remove entirely NaN rows based on the non-grouping columns
res = (pd.DataFrame(arr, columns=df.columns)
         .dropna(how='all', subset=oth_cols, axis=0))

print(res)

  type subtype1 subtype2 subtype3 flex_best flex_worst lead_best is_best lead_worst is_worst
0  E/2       N2        a      NaN      20.0       30.0      23.0     1.0       33.0      1.0
2  E/2       N2        b      NaN      20.0        NaN      23.0     1.0        NaN      NaN
3  E/2       N2      NaN      xxx       NaN       30.0       NaN     NaN       33.0      1.0

相关问题 更多 >