数据集(MWE)
location date people_vaccinated people_fully_vaccinated people_vaccinated_per_hundred
AL 12-01-2021 70861 7270 1.45
AL 13-01-2021 74792 9245 1.53
AL 14-01-2021 80480 11366 1.64
AL 15-01-2021 86956 13488 1.77
AL 16-01-2021 93797 14202 1.91
AL 17-01-2021 100638 14917 2.05
AS 22-01-2021 5627 940 10.1
AS 23-01-2021 5881 948 10.56
AS 24-01-2021 7096 948 12.74
AS 25-01-2021 7096 949 12.98
AS 26-01-2021 7230 950 13.23
AS 27-01-2021 8133 950 14.6
我试图在location
上使用groupby()
时用NaN替换{{{
def remove(df , a):
df['duplicate'] = df[a].shift(1)
df[a] = df.apply(lambda x: np.nan if x[a] == x['duplicate'] \
else x[a], axis=1)
df = df.drop('duplicate', axis=1)
return df
dfn = remove(dfn,'people_vaccinated')
dfn = remove(dfn,'people_fully_vaccinated')
dfn = remove(dfn,'people_vaccinated_per_hundred')
当您有连续的空值(超过2)时,上述逻辑将失败。我需要用NAN替换重复项(同时保留第一个实例)。最好的方法是什么?您可以从上面的代码片段中观察到people_fully_vaccinated
列具有重复的值
样本输出
location date people_vaccinated people_fully_vaccinated people_vaccinated_per_hundred
AL 12-01-2021 70861 7270 1.45
AL 13-01-2021 74792 9245 1.53
AL 14-01-2021 80480 11366 1.64
AL 15-01-2021 86956 13488 1.77
AL 16-01-2021 93797 14202 1.91
AL 17-01-2021 100638 14917 2.05
AS 22-01-2021 5627 940 10.1
AS 23-01-2021 5881 948 10.56
AS 24-01-2021 7096 NaN 12.74
AS 25-01-2021 NaN 949 12.98
AS 26-01-2021 7230 950 13.23
AS 27-01-2021 8133 NaN 14.6
这里尝试使用^{} 创建布尔掩码
^{} +^{}
我们可以定义一个列名列表,然后为for循环中的每一列
mask
定义每个唯一的重复值location
相关问题 更多 >
编程相关推荐