对两个DataFrame列中的选定项重新排序

2024-05-20 00:01:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个包含三列的熊猫数据帧,如下所示:

data = {'T1': {0: 'Belarus', 1: 'Netherlands', 2: 'France', 3: 'Faroe Islands', 
        4: 'Hungary'}, 'T2': {0: 'Sweden', 1: 'Bulgaria', 2: 'Luxembourg', 
        3: 'Andorra', 4: 'Portugal'}, 'score': {0: -4, 1: 2, 2: 0, 3: 1, 4: -1}}
df = pd.DataFrame(data)
#           T1             t2  score
#0        Belarus      Sweden     -4
#1    Netherlands    Bulgaria      2
#2         France  Luxembourg      0
#3  Faroe Islands     Andorra      1
#4        Hungary    Portugal     -1

对于项目T1T2不按字母顺序排列的任何行(例如"Netherlands""Bulgaria"),我希望交换项目并更改score的符号。你知道吗

我想出了一个怪物:

df.apply(lambda x: 
          pd.Series([x["T2"], x["T1"], -x["score"]]) 
          if (x["T1"] > x["T2"]) 
          else pd.Series([x["T1"], x["T2"], x["score"]]), 
         axis=1)
#          0              1  2
#0   Belarus         Sweden -4
#1  Bulgaria    Netherlands -2
#2    France     Luxembourg  0
#3   Andorra  Faroe Islands -1
#4   Hungary       Portugal -1

有没有更好的方法来获得同样的结果?(性能不是问题。)


Tags: pdscoret1t2francenetherlandsportugalsweden
3条回答

不如“cᴏʟᴅsᴘᴇᴇᴅ”的回答那么简洁,但要努力

df1=df[['T1','T2']]
df1.values.sort(1)
df1['new']=np.where((df1!=df[['T1','T2']]).any(1),-df.score,df.score)

df1
Out[102]: 
         T1             T2  new
0   Belarus         Sweden   -4
1  Bulgaria    Netherlands   -2
2    France     Luxembourg    0
3   Andorra  Faroe Islands   -1
4   Hungary       Portugal   -1

这里是一个有趣的和创造性的方式使用numpy工具

t = df[['T1', 'T2']].values
a = t.argsort(1)

df[['T1', 'T2']] = t[np.arange(len(t))[:, None], a]
# @ is python 3.5 thx @cᴏʟᴅsᴘᴇᴇᴅ
# otherwise use
# df['score'] *= a.dot([-1, 1])
df['score'] *= a @ [-1, 1]

df

         T1             T2  score
0   Belarus         Sweden     -4
1  Bulgaria    Netherlands     -2
2    France     Luxembourg      0
3   Andorra  Faroe Islands     -1
4   Hungary       Portugal     -1

选项1
布尔索引。你知道吗

m = df.T1 > df.T2
m 

0    False
1     True
2    False
3     True
4    False
dtype: bool

df.loc[m, 'score'] = df.loc[m, 'score'].mul(-1)
df.loc[m, ['T1', 'T2']] = df.loc[m, ['T2', 'T1']].values
df

         T1             T2  score
0   Belarus         Sweden     -4
1  Bulgaria    Netherlands     -2
2    France     Luxembourg      0
3   Andorra  Faroe Islands     -1
4   Hungary       Portugal     -1

选项2
df.eval

m = df.eval('T1 > T2')
df.loc[m, 'score'] = df.loc[m, 'score'].mul(-1)
df.loc[m, ['T1', 'T2']] = df.loc[m, ['T2', 'T1']].values
df

         T1             T2  score
0   Belarus         Sweden     -4
1  Bulgaria    Netherlands     -2
2    France     Luxembourg      0
3   Andorra  Faroe Islands     -1
4   Hungary       Portugal     -1

选项3
df.query

idx = df.query('T1 > T2').index
idx
Int64Index([1, 3], dtype='int64')

df.loc[idx, 'score'] = df.loc[idx, 'score'].mul(-1)
df.loc[idx, ['T1', 'T2']] = df.loc[idx, ['T2', 'T1']].values
df

         T1             T2  score
0   Belarus         Sweden     -4
1  Bulgaria    Netherlands     -2
2    France     Luxembourg      0
3   Andorra  Faroe Islands     -1
4   Hungary       Portugal     -1

相关问题 更多 >