在python中，如何将单个dataframe列中多个键值对的字符串分解成一个新的dataframe？

2024-06-15 05:46:26 发布

您现在位置：Python中文网/ 问答频道 /正文

8980

网友

男 | 程序猿一只，喜欢编程写python代码。

我正在将数据从sql数据库拉入数据帧。dataframe是一个列，包含存储在字符串中的各种数量的键值对。我想创建一个新的dataframe，它包含两列，一列保存键，另一列保存值

数据帧看起来像：

In[1]:
print(df.tail())

Out[1]:
WK_VAL_PAIRS
166  {('sloth', 0.073), ('animal', 0.034), ('gift', 0.7843)}                              
167  {('dabbing', 0.0863), ('gift', 0.7843)}      
168  {('grandpa', 0.0156), ('funny', 1.3714), ('grandfather', 0.0015)}                                     
169  {('nerd', 0.0216)}
170  {('funny', 1.3714), ('pineapple', 0.0107)}

理想情况下，新的数据帧将如下所示：

0  |  sloth    |  0.073
1  |  animal   |  0.034
2  |  gift     |  0.07843
3  |  dabbing  |  0.0863
4  |  gift     |  0.7843
...
etc.

我已经成功地将一行中的键值对分离到一个数据帧中，如下所示。从这里开始，将这些对拆分为它们自己的列就很简单了

In[2]:
def prep_text(row):
    string = row.replace('{', '')
    string = string.replace('}', '')
    string = string.replace('\',', '\':')
    string = string.replace(' ', '')
    string = string.replace(')', '')
    string = string.replace('(', '')
    string = string.replace('\'', '')
    return string

df['pairs'] = df['WK_VAL_PAIRS'].apply(prep_text)
dd = df['pairs'].iloc[166]
af = pd.DataFrame([dd.split(',') for x in dd.split('\n')])
af.transpose()

Out[2]:

0   sloth:0.073
1   animal:0.034
2   gift:0.7843
3   spirit:0.0065
4   fans:0.0093
5   funny:1.3714

然而，我错过了将这种转换应用于整个数据帧的飞跃。有没有办法用.apply()样式的函数而不是for each循环来实现这一点。处理这件事的最变态的方法是什么

任何帮助都将不胜感激

解决方案

在下面克里斯的强烈暗示下，我找到了一个适合我需要的解决方案：

def prep_text(row):
    string = row.replace('\'', '')
    string = '"'+ string + '"'
    return string


kvp_df = pd.DataFrame(
                        re.findall(
                            '(\w+), (\d.\d+)', 
                            df['WK_VAL_PAIRS'].apply(prep_text).sum()
                        )
                    )

Tags：数据 text df string val replace funny row

1条回答

网友

1楼 · 发布于 2024-06-15 05:46:26

用pandas.DataFrame试试re.findall：

import pandas as pd
import re

s = pd.Series(["{(stepper, 0.0001), (bob, 0.0017), (habitual, 0.0), (line, 0.0097)}",
"{(pete, 0.01), (joe, 0.0019), (sleep, 0.0), (cline, 0.0099)}"])

pd.DataFrame(re.findall('(\w+), (\d.\d+)', s.sum()))

输出：

          0       1
0   stepper  0.0001
1       bob  0.0017
2  habitual     0.0
3      line  0.0097
4      pete    0.01
5       joe  0.0019
6     sleep     0.0
7     cline  0.0099

在python中，如何将单个dataframe列中多个键值对的字符串分解成一个新的dataframe？

解决方案

相关问题更多 >

编程相关推荐

热门问题

热门文章

在python中，如何将单个dataframe列中多个键值对的字符串分解成一个新的dataframe？

解决方案

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >