如何对照查找数据帧检查由字符串列表组成的数据帧并执行计算？

df1 = pd.DataFrame(data = {'tokens' : [['auditioned', 'lead', 'role', 'play', 'play'], ['kittens', 'adopted', 'family'], ['peanut', 'butter', 'jelly', 'sandwiches', 'favorite'], ['committee', 'decorated', 'gym'], ['surprise', 'party', 'best', 'friends']]})

df3 = pd.DataFrame(data = {'tokens' : [['auditioned', 'lead', 'role', 'play', 'play'], ['kittens', 'adopted', 'family'], ['peanut', 'butter', 'jelly', 'sandwiches', 'favorite'], ['committee', 'decorated', 'gym'], ['surprise', 'party', 'best', 'friends']], 'word_count' : [3, 1, 2, 1, 0], 'total_score' : [12, 1, 9, 4, None]})

3条回答

网友

1楼 · 编辑于 2024-10-02 00:32:47

使用：

d = df2.set_index('word')['score']

def f(x):
    y = [d.get(a) for a in x if a in d]
    return pd.Series([len(y), sum(y)], index=['word_count','total_score'])

df3[['word_count','total_score']] = df3['tokens'].apply(f)
print (df3)
                                          tokens  word_count  total_score
0           [auditioned, lead, role, play, play]           3           12
1                     [kittens, adopted, family]           1            1
2  [peanut, butter, jelly, sandwiches, favorite]           2            9
3                    [committee, decorated, gym]           1            4
4               [surprise, party, best, friends]           0            0

网友

2楼 · 编辑于 2024-10-02 00:32:47

方法1

创建一个用于在apply中映射的基字典

m0 = dict(df2.values)
m1 = lambda x: m0.get(x, 0)
m2 = lambda x: int(x in m0)
df1.assign(
    word_count=df1.tokens.apply(lambda x: sum(map(m2, x))),
    Total=df1.tokens.apply(lambda x: sum(map(m1, x)))
)

                                          tokens  word_count  Total
0           [auditioned, lead, role, play, play]           3     12
1                     [kittens, adopted, family]           1      1
2  [peanut, butter, jelly, sandwiches, favorite]           2      9
3                    [committee, decorated, gym]           1      4
4               [surprise, party, best, friends]           0      0

方法2

创建一个新的序列来展开df1中的单词，但保留索引值，以便我们可以使用count和sum进行聚合。在

^{pr2}$

网友

3楼 · 编辑于 2024-10-02 00:32:47

你能做到的

d=dict(zip(df2.word,df2.score))

helpdf=df1.tokens.apply(lambda x :pd.Series([d.get(y)for y in x ]))
df1['Total']=helpdf.sum(1)
df1['count']=helpdf.notnull().sum(1)
df1
Out[338]: 
                                          tokens  Total  count
0           [auditioned, lead, role, play, play]   12.0      3
1                     [kittens, adopted, family]    1.0      1
2  [peanut, butter, jelly, sandwiches, favorite]    9.0      2
3                    [committee, decorated, gym]    4.0      1
4               [surprise, party, best, friends]    0.0      0

方法1

方法2

相关问题更多 >

编程相关推荐

热门问题

热门文章