将数据帧中的某些列相乘,而不删除其他现有列

2024-09-28 23:18:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在努力用Python(pandas)创建一个新的数据帧,它保留数据帧中的所有现有列,同时只将某些列乘以100000

我使用的代码是:

df[['t1','t2','t3','t4','t5','t6']].multiply(100000)

此代码保留t1-t6的列,但删除其他所有内容。(大约10个其他列)运行此代码时,是否有方法保留我的其他现有列


Tags: 数据方法代码内容pandasdfmultiplyt1
2条回答

可以考虑使用{{CD1>},其中要处理

的^ {< CD2>}信息
import pandas as pd

d={'t1':[2,3,1,4,5],
   't2':[5,4,3,2,1],
   'other1':[1,9,9,8,7],
   'other2':[3,3,4,5,6]}
df=pd.DataFrame(d)
print(df)

enter image description here

target_columns=['t1','t2']
for one_column in target_columns:
  df[one_column]=df[one_column].map(lambda x:x*100000)
print(df)

enter image description here

许多选择

  1. 重新分配给指定的列:
df[['t1', 't2', 't3', 't4', 't5', 't6']] = \
    df[['t1', 't2', 't3', 't4', 't5', 't6']].multiply(100000)
  1. ^{}+multiply
df.loc[:, ['t1', 't2', 't3', 't4', 't5', 't6']] = \
    df[['t1', 't2', 't3', 't4', 't5', 't6']].multiply(100000)
  1. *=
df[['t1', 't2', 't3', 't4', 't5', 't6']] *= 100000
  1. ^{}+*=
df.loc[:, ['t1', 't2', 't3', 't4', 't5', 't6']] *= 100000
  1. ^{}
df.update(df[['t1', 't2', 't3', 't4', 't5', 't6']].multiply(100000))

测试框架:

df = pd.DataFrame({
    't1': {0: 1}, 't2': {0: 2}, 't3': {0: 3}, 't4': {0: 4}, 't5': {0: 5},
    't6': {0: 6}, 't7': {0: 7}, 't8': {0: 8}, 't9': {0: 9}, 't10': {0: 10},
    't11': {0: 11}
})

输出:

       t1      t2      t3      t4      t5      t6  t7  t8  t9  t10  t11
0  100000  200000  300000  400000  500000  600000   7   8   9   10   11

通过perfplot的定时信息:

benchmarking perfplot

^{tb1}$
import numpy as np
import pandas as pd
import perfplot


def gen_data(n):
    return pd.DataFrame({f't{i}': np.arange(0, n) for i in range(1, 12)})


def assign_back_multiply(df):
    df[['t1', 't2', 't3', 't4', 't5', 't6']] = \
        df[['t1', 't2', 't3', 't4', 't5', 't6']].multiply(100000)
    return df


def assign_back_with_loc_multiply(df):
    df.loc[:, ['t1', 't2', 't3', 't4', 't5', 't6']] = \
        df[['t1', 't2', 't3', 't4', 't5', 't6']].multiply(100000)
    return df


def times_equals(df):
    df[['t1', 't2', 't3', 't4', 't5', 't6']] *= 100000
    return df


def times_equals_with_loc(df):
    df.loc[:, ['t1', 't2', 't3', 't4', 't5', 't6']] *= 100000
    return df


def df_update(df):
    df.update(df[['t1', 't2', 't3', 't4', 't5', 't6']].multiply(100000))
    return df


# @YoungMin Park's Answer
def for_loop_map(df):
    target_columns = ['t1', 't2', 't3', 't4', 't5', 't6']
    for one_column in target_columns:
        df[one_column] = df[one_column].map(lambda x: x * 100000)
    return df


if __name__ == '__main__':
    out = perfplot.bench(
        setup=gen_data,
        kernels=[
            assign_back_multiply,
            assign_back_with_loc_multiply,
            times_equals,
            times_equals_with_loc,
            df_update,
            for_loop_map
        ],
        labels=[
            'assign_back_multiply',
            'assign_back_with_loc_multiply',
            'times_equals',
            'times_equals_with_loc',
            'df_update',
            'for_loop_map @YoungMin Park'
        ],
        n_range=[2 ** k for k in range(25)],
        equality_check=None
    )
    out.save('perfplot_results.png', transparent=False)

相关问题 更多 >