我已经导入了数据。空字段显示为nan。列的数据类型是float、string、object等的混合体。我想用“N/a”替换“na”,使替换不区分大小写。我使用了来自python: better way to handle case sensitivities with df.replace的以下代码来执行此操作:
# replace NA w N/A
dfMSR = dfMSR.apply(lambda x: x.astype(str).str.replace(r'\bna\b', 'N/A', regex=True,case=False))
当我运行上述代码时,所有列的数据类型都更改为“object”。这会产生许多问题,包括以下问题:
a = dfMSR.copy()
a = a[['AppBaselineType', 'RvwBaselineTypeAction', 'RvwBaselineType']]
a['AppBaselineType'] = np.where(((a['RvwBaselineTypeAction'].isnull()) |
(a['RvwBaselineTypeAction'] == '-') |
(a['RvwBaselineTypeAction'] == 'N/A')),
a['RvwBaselineType'], a['RvwBaselineTypeAction'])
nan不会被RvwBaselineType中的值替换,因为它们已更改为实际文本“nan”
a.describe() #provides the result:
AppBaselineType RvwBaselineTypeAction RvwBaselineType
count 292 292 292
unique 4 4 4
top nan nan Existing
freq 251 251 154
print(dfMSR['RvwBaselineTypeAction'].isnull().sum()) #provides the result:
0
#replace isnull() with == nan gives the desired output
a['AppBaselineType'] = np.where(((a['RvwBaselineTypeAction'] == 'nan') |
(a['RvwBaselineTypeAction'] == '-') |
(a['RvwBaselineTypeAction'] == 'N/A')),
a['RvwBaselineType'], a['RvwBaselineTypeAction'])
理想情况下,我希望在不更改(丢失原始)数据类型的情况下运行replace。有什么建议吗
#raw data:
RvwBaselineType RvwBaselineTypeAction AppBaselineType
Existing nan nan
Existing - nan
nan nan nan
Existing N/A nan
Existing ABC nan
#desired output
RvwBaselineType RvwBaselineTypeAction AppBaselineType
Existing nan Existing
Existing - Existing
nan nan nan
ESDffogr N/A ESDffogr
Existing ABC ABC
Can share a sample file if someone can tell me how to do this on SO.
Thanks
目前没有回答
相关问题 更多 >
编程相关推荐