如何合并重复的行并用另一行的值填充NaN单元格？

df = pd.DataFrame( { "type": ["E/2", "E/2", "E/2", "E/2"], "subtype1": ["N2", "N2", "N2", "N2"], "subtype2": ["a", "a", "b", np.nan], "subtype3": [np.nan, np.nan, np.nan, "xxx"], "flex_best": [20, np.nan, 20, np.nan], "flex_worst": [np.nan, 30, np.nan, 30], "lead_best": [23, np.nan, 23, np.nan], "is_best": [1, np.nan, 1, np.nan], "lead_worst": [np.nan, 33, np.nan, 33], "is_worst": [np.nan, 1, np.nan, 1], } ) df.head() | type | subtype1 | subtype2 | subtype3 | flex_best | flex_worst | lead_best | is_best | lead_worst | is_worst| |------|----------|----------|----------|-----------|------------|-----------|---------|------------|---------| | E/2 | N2 | a | NaN | 20 | NaN | 23 | 1 | NaN | NaN | | E/2 | N2 | a | NaN | NaN | 30 | NaN | NaN | 33 | 1 | | E/2 | N2 | b | NaN | 20 | NaN | 23 | 1 | NaN | NaN | | E/3 | N2 | NaN | xxx | NaN | 30 | NaN | NaN | 33 | 1 |

| type | subtype1 | subtype2 | subtype3 | flex_best | flex_worst | lead_best | is_best | lead_worst | is_worst| |------|----------|----------|----------|-----------|------------|-----------|---------|------------|---------| | E/2 | N2 | a | NaN | 20 | 30 | 23 | 1 | 33 | 1 | | E/2 | N2 | b | NaN | 20 | NaN | 23 | 1 | NaN | NaN | | E/2 | N2 | NaN | xxx | NaN | 30 | NaN | NaN | 33 | 1 |

1条回答

网友

1楼 · 发布于 2024-10-03 23:29:43

为此，我们将在groupby中使用@cs95提供的^{}函数（这里的功劳归于@Divakar）。由于您的一些分组键包含NaN，我们需要将dropna=False添加到groupby调用中

然后将所有NaN（在非分组列的子集上）的行放在后面

import numpy as np
import pandas as pd

gp_cols = ['type', 'subtype1', 'subtype2', 'subtype3']
oth_cols = df.columns.difference(gp_cols)

arr = np.vstack(df.groupby(gp_cols, sort=False, dropna=False)
                  .apply(lambda gp: justify(gp.to_numpy(), invalid_val=np.NaN, 
                                            axis=0, side='up')))

# Reconstruct DataFrame
# Remove entirely NaN rows based on the non-grouping columns
res = (pd.DataFrame(arr, columns=df.columns)
         .dropna(how='all', subset=oth_cols, axis=0))

print(res)

  type subtype1 subtype2 subtype3 flex_best flex_worst lead_best is_best lead_worst is_worst
0  E/2       N2        a      NaN      20.0       30.0      23.0     1.0       33.0      1.0
2  E/2       N2        b      NaN      20.0        NaN      23.0     1.0        NaN      NaN
3  E/2       N2      NaN      xxx       NaN       30.0       NaN     NaN       33.0      1.0

相关问题更多 >

编程相关推荐

热门问题

热门文章