从DataFrame中解包未知对象

2024-09-30 08:18:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我是新手。我有一个数据框,其中包含一个我无法识别的对象,我需要将其解包并转换为一个新的单独数据框,以形成一个新的规范化结构

df的简化版本为:

   trasaction_id   customer_details
0   1       <customer {id:'A123', name: 'Tina'} as x >
0   2       <customer {id:'B456', name: 'Tony'} as x >
0   3       <customer {id:'C789', name: 'Tim'} as x >

Name: customer_details, dtype: object

我似乎无法在尖括号内的对象内访问字典。我尝试过各种各样的事情,如果我尝试print(df['customer_details].__dict__,我会得到以下结果:{'_is_copy': None, '_data': SingleBlockManager

我甚至尝试过做一些像这个字符串操纵器这样的黑客行为,但我确信,作为一个新手,我缺少了一些基本的东西。 '{' + df['customer_details'].apply(lambda st: st[st.find("{")+1:st.find("}")]) + '}'

最终,我试图实现的是将这些客户详细信息不显示到由事务id链接的单独df中,并将其存储在RDB中的简单规范化结构中。我相信为了使用json.dumps()等标准工具,我希望它看起来像这样(每个元素都用双引号引)

    transaction_id   customer_details
0    1        {id:'A123', name: 'Tina'}
0    2        {id:'B456', name: 'Tony'}
0    3        {id:'C789', name: 'Tim'}

这让我快发疯了。谢谢你的帮助


Tags: 数据对象nameiddfascustomerdetails
1条回答
网友
1楼 · 发布于 2024-09-30 08:18:38

似乎您有属性为idname的对象/类,因此您可以尝试获取

{'id': st.id, 'name': st.name}

也就是说

df['customer_details'] = df['customer_details'].apply(lambda x: {'id': x.id, 'name': x.name})

或直接连接到分离的列

df['id']   = df['customer_details'].apply(lambda x: x.id)
df['name'] = df['customer_details'].apply(lambda x: x.name)

示例代码:

import pandas as pd

class customer:
    def __init__(self, id_, name):
        self.id = id_
        self.name = name
    def __str__(self):
        return '<customer {{id: {}, name: {}}} as x>'.format(self.id, self.name)

data = {
    'trasaction_id': [1,2,3],
    'customer_details': [
        customer('A123', 'Tina'),
        customer('B456', 'Tony'),
        customer('C789', 'Tim')
    ],
}

df = pd.DataFrame(data)
print(df)

#  -

df['id'] = df['customer_details'].apply(lambda x: x.id)
df['name'] = df['customer_details'].apply(lambda x: x.name)
print(df)

df['customer_details'] = df['customer_details'].apply(lambda x: {'id': x.id, 'name': x.name})
print(df)

#new_df = pd.DataFrame( df['customer_details'].to_list() )

结果:

   trasaction_id                        customer_details
0              1  <customer {id: A123, name: Tina} as x>
1              2  <customer {id: B456, name: Tony} as x>
2              3   <customer {id: C789, name: Tim} as x>

   trasaction_id                        customer_details    id  name
0              1  <customer {id: A123, name: Tina} as x>  A123  Tina
1              2  <customer {id: B456, name: Tony} as x>  B456  Tony
2              3   <customer {id: C789, name: Tim} as x>  C789   Tim

   trasaction_id                customer_details    id  name
0              1  {'id': 'A123', 'name': 'Tina'}  A123  Tina
1              2  {'id': 'B456', 'name': 'Tony'}  B456  Tony
2              3   {'id': 'C789', 'name': 'Tim'}  C789   Tim

编辑:如果您有字符串,则可以使用regex从字符串中获取值

import pandas as pd
import re

data = {
    'trasaction_id': [1,2,3],
    'customer_details': [
        "<customer {id:'A123', name: 'Tina'} as x >",
        "<customer {id:'B456', name: 'Tony'} as x >",
        "<customer {id:'C789', name: 'Tim'} as x >",
    ]
}

df = pd.DataFrame(data)
print(df)

#  -

df['id'] = df['customer_details'].apply(lambda x: re.search("id:'(.*)',", x)[1])
df['name'] = df['customer_details'].apply(lambda x: re.search("name: '(.*)'}", x)[1])
print(df)

def myfunc(x):
    r = re.search("id:'(.*)', name: '(.*)'}", x)
    return {'id': r[1], 'name': r[2]}

df['customer_details'] = df['customer_details'].apply(myfunc)
print(df)

#new_df = pd.DataFrame( df['customer_details'].to_list() )

相关问题 更多 >

    热门问题