基于现有的唯一值向dataframe添加值

2024-06-01 06:04:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧DF,有两列:

CLASS   STUDENT
'Sci'   'Francy'
'Sci'   Vacant
'math'  'Alex'
'math'  'Arthur'
'math'  'Katy'
'eng'   'Jack'
'eng'   Vacant
'eng'   'Francy'
'Hist'   'Francy'
'Hist'   'Francy'

我需要所有的班级都有一个学生。其中一些已经有了。你知道吗

结果

CLASS   STUDENT
'Sci'   'Francy'
'Sci'   Vacant
'math'  'Alex'
'math'  'Arthur'
'math'  'Katy'
'math'  Vacant
'eng'   'Jack'
'eng'   Vacant
'eng'   'Francy'
'Hist'   'Francy'
'Hist'   'Francy'
'Hist'   Vacant

我试过了

unique_class = DF['unique_class'].drop_duplicates()
vacant_column = pd.Series(['vacant'] * unique_class.shape[0])
temp_df = pd.concat([unique_class, vacant_column], axis=1, ignore_index=True)
DF = DF.append(temp_df, ignore_index=True)
DF.drop_duplicates(inplace=True)

它工作,但似乎太多了。有更好的办法吗?你知道吗


Tags: truedfmathstudentenghistclassunique
3条回答

使用pd.merge

df_new = pd.DataFrame({'CLASS': df['CLASS'].unique(), 'STUDENT':'vacant'})

df_new.merge(df, how='outer', on=['CLASS','STUDENT'])

# Use `.sort_values(by='CLASS') if sorted df needed

输出:

    CLASS   STUDENT
0   Sci vacant
1   math    vacant
2   eng     vacant
3   Hist    vacant
4   Sci     Francy
5   math    Alex
6   math    Arthur
7   math    Katy
8   eng     Jack
9   eng     Francy
10  Hist    Francy
11  Hist    Francy

还有一种方法:

# Copy of your data
df = pd.DataFrame({
    "class": ["Sci", "Sci", "math", "math", "math", "eng", "eng", "eng", "Hist", "Hist"],
    "student": ["Francy", "vacant", "Alex", "Arthur", "Katy", "Jack", "vacant", "Francy", "Francy", "Francy"]
    })

# An identical DF with all students equal to "vacant"
vacant_df = pd.DataFrame({"class": df["class"], "student": "vacant"})

# Remove existing 'vacant' from original DF and concatenate with de-duplicated vacant dataframe (to avoid duplicate 'vacant' entries)
final_df = pd.concat([df.loc[df.student != "vacant", vacant_df.drop_duplicates("class")])

原始数据框:

  class student
8  Hist  Francy
9  Hist  Francy
0   Sci  Francy
1   Sci  vacant
5   eng    Jack
6   eng  vacant
7   eng  Francy
2  math    Alex
3  math  Arthur
4  math    Katy

最终测向:

  class student
8  Hist  Francy
9  Hist  Francy
8  Hist  vacant
0   Sci  Francy
0   Sci  vacant
5   eng    Jack
7   eng  Francy
5   eng  vacant
2  math    Alex
3  math  Arthur
4  math    Katy
2  math  vacant

作为记录,你的解决方案没有错。您可以使用几乎相同的方法在“一行”中获得相同的结果:

df = df.append(df[['CLASS']].drop_duplicates().assign(STUDENT='Vacant')).drop_duplicates()

[输出]

  CLASS STUDENT
0   Sci  Francy
1   Sci  Vacant
2  math    Alex
3  math  Arthur
4  math    Katy
5   eng    Jack
6   eng  Vacant
7   eng  Francy
8  Hist  Francy
2  math  Vacant
8  Hist  Vacant

如果需要,您可以在sort_valuesreset_index上链接,使表格更清晰:

df = (df.append(df[['CLASS']].drop_duplicates().assign(STUDENT='Vacant'))
      .drop_duplicates()
      .sort_values('CLASS')
      .reset_index(drop=True))

[输出]

   CLASS STUDENT
0   Hist  Francy
1   Hist  Vacant
2    Sci  Francy
3    Sci  Vacant
4    eng    Jack
5    eng  Vacant
6    eng  Francy
7   math    Alex
8   math  Arthur
9   math    Katy
10  math  Vacant

相关问题 更多 >