这是数据的信息
sex age race
Male 0.204082 Hispanic
Male 0.122449 African-American
Female 0.163265 African-American
Male 0.081633 African-American
Male 0.530612 African-American
African-American 2968
Caucasian 1969
Hispanic 502
Other 294
Asian 26
Native American 13
Name: race, dtype: int64
我想从数据集中基本上删除印第安人和亚洲人,我就是这么做的:
df_train_val_scaled = df_train_val_scaled[df_train_val_scaled["race"] != "Native American" & df_train_val_scaled["race"] != "Asian"]
这导致了以下错误:
TypeError: Cannot perform 'rand_' with a dtyped [object] array and scalar of type [bool]
所以我尝试了以下方法
df_train_val_scaled = df_train_val_scaled[df_train_val_scaled["race"] not in ["Native American", "Asian"]]
但它也会产生错误
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
谢谢你的帮助
诀窍是用
~df['race'].isin(['a', 'b', c'])
检查是否每个元素都(不是)在给定的列表中。下面是一个例子:您可以使用isin()函数根据任何列值过滤数据帧,该函数返回一个布尔序列,该序列可以传递给数据帧以获得过滤结果。
您可以将此布尔序列传递给DataFrame,DataFrame根据传递的布尔序列过滤行后返回DataFrame
相关问题 更多 >
编程相关推荐