使用另一个具有相应替换的pandas df替换pandas列中的值

2024-09-28 22:30:35 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个名为inventory的熊猫df,它有一个包含Part Numbers(字母数字)的列。其中一些部件号已经被替换,我有另一个名为replace_with的df,包含两列'old part numbers''new part numbers'。 例如:

库存值如下:

* 123AAA
* 123BBB
* 123CCC
......

替换为具有如下值

**oldPartnumbers**   .....        **newPartnumbers**  

* 123AAA        ............            123ABC
* 123CCC          ...........          123DEF

所以,我需要用新的数字替换库存中相应的值。更换后的库存如下所示:

* 123ABC
* 123BBB
* 123DEF

在python中有一种简单的方法可以做到这一点吗?谢谢!你知道吗


Tags: dfnew部件with库存字母数字old
3条回答

设置

考虑数据帧inventoryreplace_with

inventory = pd.DataFrame(dict(Partnumbers=['123AAA', '123BBB', '123CCC']))

replace_with = pd.DataFrame(dict(
        oldPartnumbers=['123AAA', '123BBB', '123CCC'],
        newPartnumbers=['123ABC', '123DEF', '123GHI']
    ))

选项1
map

d = replace_with.set_index('oldPartnumbers').newPartnumbers
inventory['Partnumbers'] = inventory['Partnumbers'].map(d)

inventory

  Partnumbers
0      123ABC
1      123DEF
2      123GHI

选项2
replace

d = replace_with.set_index('oldPartnumbers').newPartnumbers
inventory['Partnumbers'].replace(d, inplace=True)

inventory

  Partnumbers
0      123ABC
1      123DEF
2      123GHI

假设您有2个df,如下所示:

import pandas as pd
df1 = pd.DataFrame([[1,3],[5,4],[6,7]], columns = ['PN','name'])
df2 = pd.DataFrame([[2,22],[3,33],[4,44],[5,55]], columns = ['oldname','newname'])

df1型:

    PN  oldname
0   1   3
1   5   4
2   6   7

df2型:

    oldname  newname
0   2        22
1   3        33
2   4        44
3   5        55

在它们之间运行左连接:

temp = df1.merge(df2,'left',left_on='name',right_on='oldname')

温度:

    PN      name     oldname    newname
0   1        3         3.0      33.0
1   5        4         4.0      44.0
2   6        7         NaN      NaN

然后计算新的name列并替换它:

df1['name'] = temp.apply(lambda row: row['newname'] if pd.notnull(row['newname']) else row['name'], axis=1)

df1型:

    PN  name
0   1   33.0
1   5   44.0
2   6   7.0

或者,作为一行:

df1['name'] = df1.merge(df2,'left',left_on='name',right_on='oldname').apply(lambda row: row['newname'] if pd.notnull(row['newname']) else row['name'], axis=1)

这个解决方案是相对快速的-它使用pandas数据对齐和numpy“copyto”函数。你知道吗

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'partNumbers': ['123AAA', '123BBB', '123CCC', '123DDD']})
df2 = pd.DataFrame({'oldPartnumbers': ['123AAA', '123BBB', '123CCC'],
                    'newPartnumbers': ['123ABC', '123DEF', '123GHI']})

# assign index in each dataframe to original part number columns
# (faster than set_index method, but use set_index if original index must be preserved)
df1.index = df1.partNumbers
df2.index = df2.oldPartnumbers
# use pandas index data alignment
df1['updatedPartNumbers'] = df2.newPartnumbers
# use numpy to copy in old part num when a new part num is not found
np.copyto(df1.updatedPartNumbers.values,
          df1.partNumbers.values,
          where=pd.isnull(df1.updatedPartNumbers))
# reset index
df1.reset_index(drop=True, inplace=True)

df1型:

  partNumbers updatedPartNumbers
0      123AAA             123ABC
1      123BBB             123DEF
2      123CCC             123GHI
3      123DDD             123DDD

相关问题 更多 >