Python Pandas在列上合并两个数据帧和子字符串

2024-10-04 11:34:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我在Python中有两个数据帧,如下所示

df1 
CUSTOMER_KEY    LAST_NAME  FIRST_NAME   
30          f2b6769129  97bb97bebc  
46          ca0464878d  e276539bc2  
51          62f2905a7a  8dfabd6d61  
57          21032ca3bc  1f7e5e0c6e  
62          f7e7fdd8ce  eb6cf4af99  
64          f536998bbb  7fc39eacd1  
80          6069198f63  d873a71620  
99          0ba61a6f66  a6cf7af3eb
102         e8b579b776  c8048fd459

df2
CUSTOMER_KEY    LAST_NAME   FIRST_NAME
30          Arthur      Anderson      
46          Teresa      Johns     
51          Louise      Hurwitz     
57          Timothy         Addy     
62          Jeffery     Wilson      
64          Andres      Tuller      
80          Daniel      Green      
99          Frank       Nader      
102         Faith       Young

{I想在后面的数据帧中连接两个数据帧(我想合并两个数据帧)。从下面的数据帧中,我看到的结果如下

^{pr2}$

基本上,df2中的substring(last_name,1,4)和df1中的substring(last_name,1,6)并将它们连接到新列中。与其他列类似。在

我怎样才能做到这一点。在

谢谢和问候

巴拉


Tags: 数据keynamecustomersubstringfirstlastdf1
2条回答

使用str

df2['LAST_NAME']=df2['LAST_NAME'].str[:3]+df1['LAST_NAME'].str[:6]
df2['FIRST_NAME']=df2['FIRST_NAME'].str[:3]+df1['FIRST_NAME'].str[:6]

df2
Out[768]: 
   CUSTOMER_KEY  LAST_NAME FIRST_NAME
0            30  Artf2b676  And97bb97
1            46  Terca0464  Johe27653
2            51  Lou62f290  Hur8dfabd
3            57  Tim21032c  Add1f7e5e
4            62  Jeff7e7fd  Wileb6cf4
5            64  Andf53699  Tul7fc39e
6            80  Dan606919  Gred873a7
7            99  Fra0ba61a  Nada6cf7a
8           102  Faie8b579  Youc8048f

如果你需要合并。在

^{pr2}$

使用merge+str

import pandas as pd
df = pd.DataFrame([
    ['30','f2b6769129','97bb97bebc'],
    ['46','ca0464878d','e276539bc2'],
    ['51','62f2905a7a','8dfabd6d61'],
    ['57','21032ca3bc','1f7e5e0c6e'],
    ['62','f7e7fdd8ce','eb6cf4af99'],
    ['64','f536998bbb','7fc39eacd1'],
    ['80','6069198f63','d873a71620'],
    ['99','0ba61a6f66','a6cf7af3eb'],
    ['102','e8b579b776','c8048fd459']]
)

df2 = pd.DataFrame([
    ['30','Arthur','Anderson'],
    ['46','Teresa','Johns'],
    ['51','Louise','Hurwitz'],
    ['57','Timothy','Addy'],
    ['62','Jeffery','Wilson'],
    ['64','Andres','Tuller'],
    ['80','Daniel','Green'],
    ['99','Frank','Nader'],
    ['102','Faith','Young']]
)

keys = ['CUSTOMER_KEY','LAST_NAME','FIRST_NAME']
df.columns = keys
df2.columns = keys
df_join = pd.merge(df, df2, on="CUSTOMER_KEY", suffixes=['_1', '_2'])
df_join['LAST_NAME'] = df_join['LAST_NAME_2'].str.slice(0,3)+df_join['LAST_NAME_1'].str.slice(0,5)
df_join['FIRST_NAME'] = df_join['FIRST_NAME_2'].str.slice(0,3)+df_join['FIRST_NAME_1'].str.slice(0,5)
result_df = df_join[keys]


result_df.head()

相关问题 更多 >