2024-09-24 02:24:07 发布
网友
这是毕马威虚拟实习的一个问题
我的问题是,如何用具有相同职称的列中的job_industry值填充nan values job_industry列
例如:
job_title job_industry Quality Engineer Financial Services Quality Engineer Nan
我希望就业行业的nan价值充满金融服务
比如,如果某个nan值出现在job_industry,其职务是总经理,则将其填入制造业
首先,我将在job_industry和job_title之间创建一个映射(Python dict),然后将job_industry列的映射分配给job_title的NaN值
job_industry
job_title
NaN
代码如下:
df = pd.DataFrame( columns=["job_title", "job_industry"], data=[["Quality Engineer", "Financial Services"], ["Quality Engineer", None]] ) # May be there is a faster way title_industry_mapping = df.dropna(["job_industry"]).set_index("job_title")["job_industry"].drop_duplicates().to_dict() isna = df["job_industry"].isna() df.loc[isna, "job_industry"] = df.loc[isna, "job_title"].replace(title_industry_mapping)
结果:
import pandas as pd import numpy as np df = pd.DataFrame([ ['Quality Engineer','Financial Services'], ['Progammer',np.nan], ['Quality Engineer',np.nan], ['Progammer',"IT"], ['General manager',np.nan]], columns=['job_title','job_industry']) with pd.option_context('mode.use_inf_as_null', True): df = df.sort_values('job_industry', ascending=False, na_position='last') df["job_industry"].loc[(df['job_title'] == "General manager") & (df['job_industry'].isnull())] = "Manufacturing" df['job_industry'] = df.groupby('job_title')['job_industry'].fillna(method="ffill")
df['job_industry'].isnull(),这将验证job_industry列是否为空
df['job_industry'].isnull()
下面的代码将按列job_industry按null值降序排序,因为如果前面出现nan值,nan的初始值将不会替换
with pd.option_context('mode.use_inf_as_null', True): df = df.sort_values('job_industry', ascending=False, na_position='last')
如果您更喜欢排序而不是输出,您可以尝试,df.sort_index()
df.sort_index()
O/p
+ + + -+ | | job_title | job_industry | | + + -| | 0 | Quality Engineer | Financial Services | | 1 | Progammer | IT | | 2 | Quality Engineer | Financial Services | | 3 | Progammer | IT | | 4 | General manager | Manufacturing | + + + -+
首先,我将在
job_industry
和job_title
之间创建一个映射(Python dict),然后将job_industry
列的映射分配给job_title的NaN
值代码如下:
结果:
df['job_industry'].isnull()
,这将验证job_industry
列是否为空下面的代码将按列job_industry按null值降序排序,因为如果前面出现nan值,nan的初始值将不会替换
如果您更喜欢排序而不是输出,您可以尝试,
df.sort_index()
O/p
相关问题 更多 >
编程相关推荐