从第二个dataframe中选择列,其中另一列的值存在于主datafram中

2024-09-30 23:42:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我在为一个相当具体的问题而挣扎。我有两个不同长度、不同索引的数据帧。对于df1中包含的每一项,我想查看df2并获取两列(不包含在df1中),其中一列df2的值等于df1中的值。示例:

import pandas as pd

data_1 = {'TARGET_NAME':['fishinghook', 'doorlock', 'penguin', 'ashtray', 'cat', 'elephant', 'cupcake', 'exercisebench'],
          'FOOBAR':['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'],
          'ix':[320, 321, 322, 323, 324, 325, 326, 328]}

data_2 = {'IMAGE_NAME':['cat', 'penguin', 'jewelrybox', 'exercisebench', 'doorlock', 'jar', ],
          'VALUES_1':['h', 'h', 'c', 'm', 'h', 'f'],
          'VALUES_2':['hm', 'hl', 'cm', 'ml', 'hh', 'fl'],
          'ix':[616, 617, 618, 619, 620, 621]}

desired = {'TARGET_NAME':['fishinghook', 'doorlock', 'penguin', 'ashtray', 'cat', 'elephant', 'cupcake', 'exercisebench'],
          'FOOBAR':['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'],
          'PRODUCED_VALUES_1':['DROPPED', 'h', 'h', 'DROPPED', 'h', 'DROPPED', 'DROPPED', 'm'],
          'ix':[320, 321, 322, 323, 324, 325, 326, 328]}

df1 = pd.DataFrame(data_1, index=data_1['ix'])
df2 = pd.DataFrame(data_2, index=data_2['ix'])
desired_df = pd.DataFrame(desired, index=desired['ix'])

df1
Out[2]: 
    FOOBAR    TARGET_NAME   ix
320    foo    fishinghook  320
321    bar       doorlock  321
322    foo        penguin  322
323    bar        ashtray  323
324    foo            cat  324
325    bar       elephant  325
326    foo        cupcake  326
328    bar  exercisebench  328

df2
Out[3]: 
        IMAGE_NAME VALUES_1 VALUES_2   ix
616            cat        h       hm  616
617        penguin        h       hl  617
618     jewelrybox        c       cm  618
619  exercisebench        m       ml  619
620       doorlock        h       hh  620
621            jar        f       fl  621

desired_df
Out[4]: 
    FOOBAR PRODUCED_VALUES_1    TARGET_NAME   ix
320    foo           DROPPED    fishinghook  320
321    bar                 h       doorlock  321
322    foo                 h        penguin  322
323    bar           DROPPED        ashtray  323
324    foo                 h            cat  324
325    bar           DROPPED       elephant  325
326    foo           DROPPED        cupcake  326
328    bar                 m  exercisebench  328

我想查看df1['TARGET\u NAME']中的每个值,如果它等于df2['IMAGE\u NAME'],则从df2中获取值\u 1和值\u 2列,并将这些详细信息添加到df1(或df1的副本)。如果它与df2中的任何地方都不匹配(因为位置也都不同),那么我希望它编写其他内容(例如“DROPPED”)。理想情况下,我希望df1索引保持不变。你知道吗

感谢您的帮助!你知道吗


Tags: nametargetdatafoobarcatpenguindf1
2条回答

您可以通过重命名列来合并数据,然后用所需的列名重命名列,然后用dropped填充生成的\u值的nan并删除nan。最后设置df1索引。你知道吗

ndf = df1.merge(df2.rename(columns = {'IMAGE_NAME':'TARGET_NAME'}),how='outer',on='TARGET_NAME')
ndf = ndf.drop(['ix_y','VALUES_2'],1).rename(columns={'ix_x':'ix','VALUES_1':'PRODUCED_VALUES_1'})

ndf['PRODUCED_VALUES_1'] = ndf['PRODUCED_VALUES_1'].fillna('Dropped')
ndf = ndf.dropna().set_index(df1.index)
    FOOBAR    TARGET_NAME     ix PRODUCED_VALUES_1
320    foo    fishinghook  320.0           Dropped
321    bar       doorlock  321.0                 h
322    foo        penguin  322.0                 h
323    bar        ashtray  323.0           Dropped
324    foo            cat  324.0                 h
325    bar       elephant  325.0           Dropped
326    foo        cupcake  326.0           Dropped
328    bar  exercisebench  328.0                 m
In [34]: df1['PRODUCED_VALUES_1'] = \
             df1['TARGET_NAME'].map(df2.set_index('IMAGE_NAME')['VALUES_1']) \
                               .fillna('DROPPED')

In [35]: df1
Out[35]:
    FOOBAR    TARGET_NAME   ix PRODUCED_VALUES_1
320    foo    fishinghook  320           DROPPED
321    bar       doorlock  321                 h
322    foo        penguin  322                 h
323    bar        ashtray  323           DROPPED
324    foo            cat  324                 h
325    bar       elephant  325           DROPPED
326    foo        cupcake  326           DROPPED
328    bar  exercisebench  328                 m

或一个类似于@Bharath shetty溶液的衬垫:

In [26]: df1.merge(df2[['IMAGE_NAME','VALUES_1']].rename(columns={'IMAGE_NAME':'TARGET_NAME'}),
    ...:           how='left') \
    ...:    .fillna('DROPPED') \
    ...:    .rename(columns=lambda c: 'PRODUCED_' + c if c=='VALUES_1' else c) \
    ...:    .set_index(df1.index)
    ...:
Out[26]:
    FOOBAR    TARGET_NAME   ix PRODUCED_VALUES_1
320    foo    fishinghook  320           DROPPED
321    bar       doorlock  321                 h
322    foo        penguin  322                 h
323    bar        ashtray  323           DROPPED
324    foo            cat  324                 h
325    bar       elephant  325           DROPPED
326    foo        cupcake  326           DROPPED
328    bar  exercisebench  328                 m

相关问题 更多 >