数据帧:拖放对象键入具有特定值的列

2024-09-26 18:11:26 发布

您现在位置:Python中文网/ 问答频道 /正文

如何使用函数删除值超过50种的列?你知道吗

这里放下列:发送日期,出生日期,客户开放日期,客户经理团队,发布日期,创建日期

app_train.select_dtypes('object').apply(pd.Series.nunique, axis = 0)

label                           1
date_dispatch                2883
con_birth_dt                12617
con_sex_mf                      2
dat_cust_open                 264
cust_mgr_team                2250
mng_issu_date                1796
um_num                         38
created_date                 2900
hqck_flag                       2
dqck_flag                       2
tzck_flag                       2
yhlcck_flag                     2
bzjck_flag                      2
gzck_flag                       2
jjsz_flag                       2
e_yhlcck_flag                   2
zq_flag                         2
xtsz_flag                       1
whsz_flag                       1
hjsz_flag                       2
yb_flag                         2
qslc_flag                       2

Tags: 函数appdate客户objecttrain团队con
2条回答

^{}+^{}

可以将nunique后跟loc与布尔索引一起使用:

n = 5  # maximum number of unique values permitted
counts = app_train.select_dtypes(['object']).apply(pd.Series.nunique)
df = app_train.loc[:, ~app_train.columns.isin(counts[counts > n].index)]

# data from jezrael
print(df)

   B  C  D  E  F
0  4  7  1  5  a
1  5  8  3  3  a
2  4  9  5  6  a
3  5  4  7  9  b
4  5  2  1  2  b
5  4  3  0  4  b

^{}^{}筛选的index值一起使用:

a = app_train.select_dtypes('object').apply(pd.Series.nunique, axis = 0)
df = app_train.drop(a.index[a > 50], axis=1)

另一种解决方案是为缺少的columns添加^{},然后按inverted条件<=过滤:

a = (app_train.select_dtypes('object')
              .apply(pd.Series.nunique, axis = 0)
              .reindex(app_train.columns, fill_value=0))

df = app_train.loc[:, a <= 50]

样本

app_train = pd.DataFrame({
        'A':list('abcdef'),
         'B':[4,5,4,5,5,4],
         'C':[7,8,9,4,2,3],
         'D':[1,3,5,7,1,0],
         'E':[5,3,6,9,2,4],
         'F':list('aaabbb')
})

print (app_train)
   A  B  C  D  E  F
0  a  4  7  1  5  a
1  b  5  8  3  3  a
2  c  4  9  5  6  a
3  d  5  4  7  9  b
4  e  5  2  1  2  b
5  f  4  3  0  4  b

a = (app_train.select_dtypes('object')
              .apply(pd.Series.nunique, axis = 0)
              .reindex(app_train.columns, fill_value=0))

df = app_train.loc[:, a <= 5]
print (df)
   B  C  D  E  F
0  4  7  1  5  a
1  5  8  3  3  a
2  4  9  5  6  a
3  5  4  7  9  b
4  5  2  1  2  b
5  4  3  0  4  b

相关问题 更多 >

    热门问题