Python在不同的datafram中使用列填充数据帧fillna

2024-09-21 01:19:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我想用不同数据帧中的另一列值填充“gvkey”列中的Nan值。在

df
     wrds_id    isin_code   gvkey   gvkey_new
 0  1004    US0003611052    1004.0  1004.0
 1  1005    US1948302047    Nan     1000.0
 2  1006    US1948302047    Nan     1004.0
 3  1007    US0309541011    Nan     1004.0
 4  1007    US0003611052    1004.0  1004.0
 5  1008    IL0006046119    Nan     1004.0
 6  1008    US0003611052    1004.0  1004.0
 7  1009    US4448591028    Nan     1004.0
 8  1004    US4448591028    Nan     1004.0
 9  1004    US4448591028    Nan     1004.0
 10 1013    US0008861017    1013.0  1013.0
 11 1013    BE0003755692    Nan     1013.0
 12 1013    BE0003755692    Nan     1013.0

使用此帧时,基于isin_代码值,将Nan值替换为第一个数据帧中的gvkey_ciq_新值

^{pr2}$

我要制作的最终数据帧

finaldf
     wrds_id    isin_imp    gvkey   gvkey_new
 0  1004    US0003611052    1004.0  1004.0
 1  1005    US1948302047    3176.0  1004.0
 2  1006    US1948302047    3176.0  1004.0
 3  1007    US0309541011    1485.0  1004.0
 4  1007    US0003611052    1004.0  1004.0
 5  1008    IL0006046119    2018.0  1004.0
 6  1008    US0003611052    1004.0  1004.0
 7  1009    US4448591028    5776.0  1004.0
 8  1004    US4448591028    5776.0  1004.0
 9  1004    US4448591028    5776.0  1004.0
 10 1013    US0008861017    1013.0  1013.0
 11 1013    BE0003755692    5150.0  1013.0
 12 1013    BE0003755692    5150.0  1013.0

如何使用map函数生成数据帧?在


Tags: 数据iddfnewnanwrdsisingvkey
2条回答

不要使用任何循环。合并数据帧并使用numpy.where和{}

把事情安排好

from io import StringIO

import numpy
import pandas

d1 = StringIO("""\
     wrds_id    isin_code   gvkey   gvkey_new
 0  1004    US0003611052    1004.0  1004.0
 1  1005    US1948302047    Nan     1000.0
 2  1006    US1948302047    Nan     1004.0
 3  1007    US0309541011    Nan     1004.0
 4  1007    US0003611052    1004.0  1004.0
 5  1008    IL0006046119    Nan     1004.0
 6  1008    US0003611052    1004.0  1004.0
 7  1009    US4448591028    Nan     1004.0
 8  1004    US4448591028    Nan     1004.0
 9  1004    US4448591028    Nan     1004.0
 10 1013    US0008861017    1013.0  1013.0
 11 1013    BE0003755692    Nan     1013.0
 12 1013    BE0003755692    Nan     1013.0
 """)

d2 = StringIO("""\
    isin_code   gvkey_ciq_new
0   US0309541011    1485.0
1   IL0006046119    2018.0
3   US1948302047    3176.0
4   US2376881064    3760.0
5   BE0003755692    5150.0
7   US4448591028    5776.0
9   GB0004544929    5898.0
""")
df1 = pandas.read_table(d1, sep='\s+', na_values=['Nan'])
df2 = pandas.read_table(d2, sep='\s+', na_values=['Nan'])

合并并计算最后一列

^{pr2}$

这给了我:

    wrds_id     isin_code   gvkey  gvkey_new
0      1004  US0003611052  1004.0     1004.0
1      1005  US1948302047  3176.0     1000.0
2      1006  US1948302047  3176.0     1004.0
3      1007  US0309541011  1485.0     1004.0
4      1007  US0003611052  1004.0     1004.0
5      1008  IL0006046119  2018.0     1004.0
6      1008  US0003611052  1004.0     1004.0
7      1009  US4448591028  5776.0     1004.0
8      1004  US4448591028  5776.0     1004.0
9      1004  US4448591028  5776.0     1004.0
10     1013  US0008861017  1013.0     1013.0
11     1013  BE0003755692  5150.0     1013.0
12     1013  BE0003755692  5150.0     1013.0

首先,做一个没有NAN的临时测向。然后可以使用布尔索引:

df_tmp = df[df.gvkey != pd.np.nan]
for code, gv in zip(df_tmp["isin_code"], df_tmp["gvkey"]):
    df1.at[df1.isin_code == code,"gvkey"] = gv

也许不是最优雅的解决方案,但它应该是有效的。在

编辑:或者在循环中添加类似

^{pr2}$

那你就不需要df\U tmp了

相关问题 更多 >

    热门问题