如何识别和标记列/数组中每n个元素的序列中的异常，或每5个元素组成的组中的“奇数”异常

2条回答

网友

1楼 · 编辑于 2024-06-01 13:06:53

一行完成

在编辑之后。这应该有效

df['anomaly']=df.ref.replace(['Y','N'], ['NaN','x'])

如果您将原始问题保留在哪里

print(df)

   group ref, pred
0    1,   Y,    Y
1    2,   Y,    Y
2    3,   Y,    Y
3    4,   Y,    N
4    5,   Y,    Y
5    1,   Y,    Y
6    2,   Y,    Y
7    3,   Y,    N
8    4,   Y,    Y
9    5,   N,    N

解决方案

df['anomaly']=pd.Series(np.where(df.iloc[:,-2:].replace(['Y,','Y','N,','N'],[True, True, False, False]).nunique(1).eq(2),'x',np.nan))

    group ref, pred anomaly
0    1,   Y,    Y     nan
1    2,   Y,    Y     nan
2    3,   Y,    Y     nan
3    4,   Y,    N       x
4    5,   Y,    Y     nan
5    1,   Y,    Y     nan
6    2,   Y,    Y     nan
7    3,   Y,    N       x
8    4,   Y,    Y     nan
9    5,   N,    N     nan

它是如何工作的

#df.replace(-the datset is dirty. I get rid of  the commas indf.ref as I simultaneously convert Y into True and N into false

g=df.iloc[:,-2:].replace(['Y,','Y','N,','N'],[True, True, False, False])

#g.nunique(1) ccounts the unique values in rowwise

g.nunique(1)

#np.where(condition, answer if condition is true, answer if condition is false) helps me populate x and NaN

aaray=np.where(g.nunique(1).eq(2),'x',np.nan)

#pd.Series(array) converts array into df column

网友
2楼 · 编辑于 2024-06-01 13:06:53

只需比较ref和pred列即可得到所需的结果
a = pd.DataFrame([[1,'Y','Y'],[2,'Y','Y'],[3,'Y','Y'],[4,'Y','N'],[5,'Y','Y'], [1,'Y','Y'],[2,'Y','Y'],[3,'Y','N'], [4,'Y','Y'], [5,'N','N']], columns = ['group', 'ref', 'pred']) a['anomaly'] = a['ref'] == a['pred'] group ref pred anomaly 0 1 Y Y True 1 2 Y Y True 2 3 Y Y True 3 4 Y N False 4 5 Y Y True 5 1 Y Y True 6 2 Y Y True 7 3 Y N False 8 4 Y Y True 9 5 N N True

相关问题更多 >

编程相关推荐

热门问题

热门文章