替换pandas数据帧中每个单元值的有效方法

print(tw) topic_id word_prob_pair 0 0 [(customer, 0.061703717964), (team, 0.01724444... 1 1 [(team, 0.0260560163563), (customer, 0.0247838... 2 2 [(customer, 0.0171786268847), (footfall, 0.012... 3 3 [(team, 0.0290787264225), (product, 0.01570401... 4 4 [(team, 0.0197917953222), (data, 0.01343226630... 5 5 [(customer, 0.0263740639141), (team, 0.0251677... 6 6 [(customer, 0.0289764173735), (team, 0.0249938... 7 7 [(client, 0.0265082412402), (want, 0.016477447... 8 8 [(customer, 0.0524006965405), (team, 0.0322975... 9 9 [(generic, 0.0373422774996), (product, 0.01834... 10 10 [(customer, 0.0305256248248), (team, 0.0241559... 11 11 [(customer, 0.0198707090364), (ad, 0.018516805... 12 12 [(team, 0.0159852971954), (customer, 0.0124540... 13 13 [(team, 0.033444510469), (store, 0.01961003290... 14 14 [(team, 0.0344793243818), (customer, 0.0210975... 15 15 [(team, 0.026416114692), (customer, 0.02041691... 16 16 [(campaign, 0.0486186973667), (team, 0.0236024... 17 17 [(customer, 0.0208270072145), (branch, 0.01757... 18 18 [(team, 0.0280889397541), (customer, 0.0127932... 19 19 [(team, 0.0297011415217), (customer, 0.0216007...

2条回答

网友

1楼 · 编辑于 2024-05-19 01:05:31

`numpy`

tid1 = df.topic_id.values
lens = [len(i) for i in df.word_prob_pair.values]
tid2 = tid1.repeat(lens)
cat, prob = np.concatenate(df.word_prob_pair.values).T
ucat, inv = np.unique(cat, return_inverse=True)
data = np.zeros((len(tid1), len(ucat)), dtype=float)
data[tid2, inv] = prob
pd.DataFrame(data, tid1, ucat)

定时

网友

2楼 · 编辑于 2024-05-19 01:05:31

您可以将list comprehension与DataFrame构造函数一起使用，最后用^{}将NaN替换为{}：

df = pd.DataFrame({'word_prob_pair':[

[('customer', 0.061703717964), ('team', 0.01724444)],
[('team', 0.0260560163563), ('customer', 0.0247838)],
[('customer', 0.0171786268847), ('footfall', 0.012)],
[('team', 0.0290787264225), ('product', 0.01570401)],
[('team', 0.0197917953222), ('data', 0.01343226630)],
[('customer', 0.0263740639141), ('team', 0.0251677)],
[('customer', 0.0289764173735), ('team', 0.0249938)],
[('client', 0.0265082412402), ('want', 0.016477447)]
] })
print (df)
                                     word_prob_pair
0  [(customer, 0.061703717964), (team, 0.01724444)]
1  [(team, 0.0260560163563), (customer, 0.0247838)]
2  [(customer, 0.0171786268847), (footfall, 0.012)]
3  [(team, 0.0290787264225), (product, 0.01570401)]
4   [(team, 0.0197917953222), (data, 0.0134322663)]
5  [(customer, 0.0263740639141), (team, 0.0251677)]
6  [(customer, 0.0289764173735), (team, 0.0249938)]
7  [(client, 0.0265082412402), (want, 0.016477447)]

df1 = pd.DataFrame([dict(x) for x in df.word_prob_pair])
df1 = df1.fillna(0)
print (df1)
     client  customer      data  footfall   product      team      want
0  0.000000  0.061704  0.000000     0.000  0.000000  0.017244  0.000000
1  0.000000  0.024784  0.000000     0.000  0.000000  0.026056  0.000000
2  0.000000  0.017179  0.000000     0.012  0.000000  0.000000  0.000000
3  0.000000  0.000000  0.000000     0.000  0.015704  0.029079  0.000000
4  0.000000  0.000000  0.013432     0.000  0.000000  0.019792  0.000000
5  0.000000  0.026374  0.000000     0.000  0.000000  0.025168  0.000000
6  0.000000  0.028976  0.000000     0.000  0.000000  0.024994  0.000000
7  0.026508  0.000000  0.000000     0.000  0.000000  0.000000  0.016477

`numpy`

相关问题更多 >

编程相关推荐

热门问题

热门文章