pandas dataframe 替换列中的字符串值,从另一列获取替换值

2024-05-04 05:10:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图详细说明三个csv文件,并且只创建一个合并有用数据的文件。你知道吗

现在,我被这个问题困住了:

我有两列(后缀COD\u METEL),有150万行,我需要详细说明并创建另一个包含结果的列。你知道吗

        SUFFIX    COD_METEL
0          CBR   CBR8901027
1          CBR   CBR8901028
2          CBR   CBR8904001
3          CBR   CBR8904002
4          CBR   CBR8904008
5          CBR   CBR8904027
6          CBR   CBR8904039
7          THO  THO96666290
8          THO  THO96666294
9          THO  THO96666298
10         THO  THO96666302
11         THO  THO96666322
12         THO  THO96666326
13          ZV   ZV111900NI
14          ZV   ZV111910NI
15          ZX    ZX2021.AC
16          ZX    ZX2021.AC
17          ZX    ZX6066.AC
18          ZX    ZX6111.AC
19          ZX    ZX6111.AC
20          ZX    ZX6380.AC
21          ZX       ZX9030
22          ZX       ZX9030
23          ZX       ZX9030
24          ZZ   ZZ00012565

在这里,我需要将后缀值“减去”到COD\u METEL,如下所示:

df["RESULT"] = df["COD_METEL"] - df["SUFFIX"]

        SUFFIX    COD_METEL     RESULT
0          CBR   CBR8901027    8901027
1          CBR   CBR8901028    8901028
2          CBR   CBR8904001    8904001

我知道使用“-”运算符是不可能的,所以我向您请教一些技巧来解决这个问题,并快速替换所有值。你知道吗

我已经试过做一些测试:

replaceList = list(set(df["SUFFIX"]))
for to_replace in replaceList:
    df["RESULT"] = df["COD_METEL"].str.replace(to_replace,"")

Tags: 文件dfresultsuffix后缀acreplacecod
2条回答

如果没有缺少值,可以尝试list comprehension

df['new'] = [j.replace(i, '') for i, j in zip(df['SUFFIX'], df['COD_METEL'])]
print (df)
   SUFFIX    COD_METEL       new
0     CBR   CBR8901027   8901027
1     CBR   CBR8901028   8901028
2     CBR   CBR8904001   8904001
3     CBR   CBR8904002   8904002
4     CBR   CBR8904008   8904008
5     CBR   CBR8904027   8904027
6     CBR   CBR8904039   8904039
7     THO  THO96666290  96666290
8     THO  THO96666294  96666294
9     THO  THO96666298  96666298
10    THO  THO96666302  96666302
11    THO  THO96666322  96666322
12    THO  THO96666326  96666326
13     ZV   ZV111900NI  111900NI
14     ZV   ZV111910NI  111910NI
15     ZX    ZX2021.AC   2021.AC
16     ZX    ZX2021.AC   2021.AC
17     ZX    ZX6066.AC   6066.AC
18     ZX    ZX6111.AC   6111.AC
19     ZX    ZX6111.AC   6111.AC
20     ZX    ZX6380.AC   6380.AC
21     ZX       ZX9030      9030
22     ZX       ZX9030      9030
23     ZX       ZX9030      9030
24     ZZ   ZZ00012565  00012565

性能:

#[250000 rows x 2 columns]
df = pd.concat([df] * 10000, ignore_index=True)
#print (df)

In [289]: %timeit df['RESULT'] = df.apply(lambda x: x['COD_METEL'].replace(x['SUFFIX'], ''), axis=1)
5.05 s ± 347 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [290]: %timeit df['new'] = [j.replace(i, '') for i, j in zip(df['SUFFIX'], df['COD_METEL'])]
98.7 ms ± 8.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

另一种方法是:

df['RESULT'] = df.apply(lambda x: x['COD_METEL'].replace(x['SUFFIX'], ''), axis=1)
df

   SUFFIX    COD_METEL    RESULT
0     CBR   CBR8901027   8901027
1     CBR   CBR8901028   8901028
2     CBR   CBR8904001   8904001
3     CBR   CBR8904002   8904002
4     CBR   CBR8904008   8904008
5     CBR   CBR8904027   8904027
6     CBR   CBR8904039   8904039
7     THO  THO96666290  96666290
8     THO  THO96666294  96666294
9     THO  THO96666298  96666298
10    THO  THO96666302  96666302
11    THO  THO96666322  96666322
12    THO  THO96666326  96666326
13     ZV   ZV111900NI  111900NI
14     ZV   ZV111910NI  111910NI
15     ZX    ZX2021.AC   2021.AC
16     ZX    ZX2021.AC   2021.AC
17     ZX    ZX6066.AC   6066.AC
18     ZX    ZX6111.AC   6111.AC
19     ZX    ZX6111.AC   6111.AC
20     ZX    ZX6380.AC   6380.AC
21     ZX       ZX9030      9030
22     ZX       ZX9030      9030
23     ZX       ZX9030      9030
24     ZZ   ZZ00012565  00012565

相关问题 更多 >