如果数据帧中的两列在循环中具有相同的结尾,则减去它们

2024-06-26 09:56:50 发布

您现在位置:Python中文网/ 问答频道 /正文

如果我的数据是这样的

Index    Country    ted_Val1  sam_Val1 ... ted_Val10 sam_Val10
1        Australia  1     3    ... 20 5
2        Bambua     12    33   ... 15 56
3        Tambua     14    34   ... 10 58

df = pd.DataFrame([["Australia", 1, 3, 20, 5],
                   ["Bambua", 12, 33, 15, 56],
                   ["Tambua", 14, 34, 10, 58]
                  ], columns=["Country", "ted_Val1", "sam_Val1", "ted_Val10", "sam_Val10"]
                 )

我想使用列表从所有“ted”值中减去所有“val”列,创建一个以“dif”开头的新列,这样:

Index    Country    ted_Val1  sam_Val1 diff_Val1 ... ted_Val10 sam_Val10 diff_val10
1        Australia  1     3    -2 ... 20 5 -15
2        Bambua     12    33   12 ... 15 56 -41
3        Tambua     14    34   14... 10 58 -48

到目前为止,我已经:

calc_vars = ['ted_Val1',
'sam_Val1',
'ted_Val10',
'sam_Val10']
 
 for i in calc_vars:
     df_diff['dif_' + str(i)] = df.['ted_' + str(i)] - df.['sam_' + str(i)]

但我有一些错误,不知道该怎么办。作为警告,这是伪数据,名称中可能有多个下划线


Tags: 数据dfindexsamdiffcalccountrystr
2条回答

试试这个

calc_vars = ['ted_Val1', 'sam_Val1', 'ted_Val10', 'sam_Val10']

# extract even & odd values from calc_vars
# ['ted_Val1', 'ted_Val10'], ['sam_Val1', 'sam_Val10']

for ted, sam in zip(calc_vars[::2], calc_vars[1::2]):
    df['diff_' + ted.split("_")[-1]] = df[ted] - df[sam]

编辑:如果列未排序

ted_cols = sorted(df.filter(regex="ted_Val\d+"), key=lambda x : x.split("_")[-1])
sam_cols = sorted(df.filter(regex="sam_Val\d+"), key=lambda x : x.split("_")[-1])

for ted, sam in zip(ted_cols, sam_cols):
    df['diff_' + ted.split("_")[-1]] = df[ted] - df[sam]

     Country  ted_Val1  sam_Val1  ted_Val10  sam_Val10  diff_Val1  diff_Val10
0  Australia         1         3         20          5         -2          15
1     Bambua        12        33         15         56        -21         -41
2     Tambua        14        34         10         58        -20         -48

IIUC您可以使用filter选择要进行减法运算的列(假设您的列与示例一样正确排序):

print (pd.concat([df, pd.DataFrame(df.filter(like="ted").to_numpy()-df.filter(like="sam").to_numpy(),
                                   columns=["diff"+i.split("_")[-1] for i in df.columns if "ted_Val" in i])],1))

     Country  ted_Val1  sam_Val1  ted_Val10  sam_Val10  diff1  diff10
0  Australia         1         3         20          5     -2      15
1     Bambua        12        33         15         56    -21     -41
2     Tambua        14        34         10         58    -20     -48

相关问题 更多 >