我有一个数据帧df:
df = pd.DataFrame(
{
"type": ["E/2", "E/2", "E/2", "E/2"],
"subtype1": ["N2", "N2", "N2", "N2"],
"subtype2": ["a", "a", "b", np.nan],
"subtype3": [np.nan, np.nan, np.nan, "xxx"],
"flex_best": [20, np.nan, 20, np.nan],
"flex_worst": [np.nan, 30, np.nan, 30],
"lead_best": [23, np.nan, 23, np.nan],
"is_best": [1, np.nan, 1, np.nan],
"lead_worst": [np.nan, 33, np.nan, 33],
"is_worst": [np.nan, 1, np.nan, 1],
}
)
df.head()
| type | subtype1 | subtype2 | subtype3 | flex_best | flex_worst | lead_best | is_best | lead_worst | is_worst|
|------|----------|----------|----------|-----------|------------|-----------|---------|------------|---------|
| E/2 | N2 | a | NaN | 20 | NaN | 23 | 1 | NaN | NaN |
| E/2 | N2 | a | NaN | NaN | 30 | NaN | NaN | 33 | 1 |
| E/2 | N2 | b | NaN | 20 | NaN | 23 | 1 | NaN | NaN |
| E/3 | N2 | NaN | xxx | NaN | 30 | NaN | NaN | 33 | 1 |
我想通过以下方式删除重复行并将它们合并在一起: [“类型”、“子类型1”、“子类型”、“子类型3”] ,在适当的地方填充NAN
因此:
这将导致此数据帧:
| type | subtype1 | subtype2 | subtype3 | flex_best | flex_worst | lead_best | is_best | lead_worst | is_worst|
|------|----------|----------|----------|-----------|------------|-----------|---------|------------|---------|
| E/2 | N2 | a | NaN | 20 | 30 | 23 | 1 | 33 | 1 |
| E/2 | N2 | b | NaN | 20 | NaN | 23 | 1 | NaN | NaN |
| E/2 | N2 | NaN | xxx | NaN | 30 | NaN | NaN | 33 | 1 |
我怎样才能对熊猫做到这一点
为此,我们将在groupby中使用@cs95提供的^{} 函数(这里的功劳归于@Divakar)。由于您的一些分组键包含
NaN
,我们需要将dropna=False
添加到groupby调用中然后将所有
NaN
(在非分组列的子集上)的行放在后面相关问题 更多 >
编程相关推荐