用Pandas识别连续的NaN第2部分

2024-10-06 12:19:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个与前一个问题有关的问题:Identifying consecutive NaN's with pandas

我是stackoverflow的新手,因此无法添加注释,但我想知道,在计算连续NaN的数量时,如何部分保留数据帧的原始索引

因此,不是:

df = pd.DataFrame({'a':[1,2,np.NaN, np.NaN, np.NaN, 6,7,8,9,10,np.NaN,np.NaN,13,14]})
df
Out[38]:
     a
0    1
1    2
2  NaN
3  NaN
4  NaN
5    6
6    7
7    8
8    9
9   10
10 NaN
11 NaN
12  13
13  14

我希望获得以下资料:


Out[41]:
     a
0    0
1    0
2    3
5    0
6    0
7    0
8    0
9    0
10   2
12   0
13   0

Tags: 数据dataframepandasdf数量withnpnan
1条回答
网友
1楼 · 发布于 2024-10-06 12:19:22

我找到了一个解决办法。它很难看,但它确实起了作用。我希望您没有海量数据,因为它的性能可能不太好:

df = pd.DataFrame({'a':[1,2,np.NaN, np.NaN, np.NaN, 6,7,8,9,10,np.NaN,np.NaN,13,14]})
df1 = df.a.isnull().astype(int).groupby(df.a.notnull().astype(int).cumsum()).sum()

# Determine the different groups of NaNs. We only want to keep the 1st. The 0's are non-NaN values, the 1's are the first in a group of NaNs. 
b = df.isna()
df2 = b.cumsum() - b.cumsum().where(~b).ffill().fillna(0).astype(int)
df2 = df2.loc[df2['a'] <= 1]

# Set index from the non-zero 'NaN-count' to the index of the first NaN
df3 = df1.loc[df1 != 0]
df3.index = df2.loc[df2['a'] == 1].index

# Update the values from df3 (which has the right values, and the right index), to df2 
df2.update(df3)

NaN group thingy受到以下答案的启发:这来自Thisanswer

相关问题 更多 >