Pandas group by one column将另一列的值连接为分隔的lis

2024-10-01 15:48:42 发布

您现在位置:Python中文网/ 问答频道 /正文

我想将所有资格证书(作为分隔符分隔的列表)与职称进行分组。在

在以下数据集中,相同类型的作业(.net developer)需要不同的资格集,而另一个作业不需要任何资格。在

JobID    Job Title      Qualification ID Qualification Name
34455226 .Net Developer ICT50715         Diploma of Software Development
34455226 .Net Developer ICT40515         Certificate IV in Programming
34466933 .Net Developer ICT50715         Diploma of Software Development
34466111 .Net Developer ICT50655         Diploma of Software Testing
34479964 Snr Finance Systems Analyst 

我想要一个综合的观点,所有独特的资格可能需要一个特定类型的工作如下

^{pr2}$

这是我迄今为止所做的。在

def f(x):
 return pd.Series(dict(Qualifications = ",".join(map(str, x["Qualification Name"]))))

df_jobs_qualifications\
    .groupby("Job Title")[['Qualification Name']]\
    .apply(f)

但它给了我重复的资格名称(见下文-软件开发文凭是重复的),而我想要唯一的资格名称

Job Title                     Qualifications
.Net Developer                Diploma of Software Development,Certificate IV in Programming,Diploma of Software Development,Diploma of Software Testing
Snr Finance Systems Analyst   N/A

更新

我的问题与this问题不同,因为即使按照前面提到的问题中提到的步骤,我也无法获得唯一值 enter image description here


Tags: ofname类型developernettitle作业job
1条回答
网友
1楼 · 发布于 2024-10-01 15:48:42

如果需要唯一字符串s:

您可以添加set或{a1},如果可能,还可以添加一些Nones或{}s添加^{}

df1 = (df.groupby('Job Title')['Qualification Name']
       .apply(lambda x: ','.join(set(x.dropna())))
       .reset_index())

print (df1)
                     Job Title  \
0               .Net Developer   
1  Snr Finance Systems Analyst   

                                  Qualification Name  
0  Diploma of Software Development,Diploma of Sof...  
1     

如果顺序很重要:

^{pr2}$

如果想要NaNs表示没有值:

def f(x):
    val = set(x.dropna())
    if len(val) > 0:
        val = ','.join(val)
    else:
        val = np.nan
    return val

df2 = df.groupby('Job Title')['Qualification Name'].apply(f).reset_index()
print (df2)
                     Job Title  \
0               .Net Developer   
1  Snr Finance Systems Analyst   

                                  Qualification Name  
0  Diploma of Software Development,Diploma of Sof...  
1                                                NaN  

如果需要唯一列表s:

df2 = (df.groupby('Job Title')['Qualification Name']
       .apply(lambda x: list(set(x)))
       .reset_index())

print (df2)
                     Job Title  \
0               .Net Developer   
1  Snr Finance Systems Analyst   

                                  Qualification Name  
0  [Diploma of Software Development, Diploma of S...  
1                                             [None]  

df2 = (df.groupby('Job Title')['Qualification Name']
       .apply(lambda x: list(x.unique()))
       .reset_index())

print (df2)
                     Job Title  \
0               .Net Developer   
1  Snr Finance Systems Analyst   

                                  Qualification Name  
0  [Diploma of Software Development, Certificate ...  
1                                             [None]  

相关问题 更多 >

    热门问题