列中的每个数据点都有一个字典列表。如何将这些条目转换为列?

2024-10-02 06:30:42 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有这样一个数据帧:

Name    Classes

Bill    [{'class': CS152, 'time': 2:00 PM}, {'class': PHYS162, 'time': 3:30 PM}]
Adam    [{'class': EE193, 'time': 1:00 PM}, {'class': PHYS162, 'time': 2:30 PM}]
Sara    [{'class': CS152, 'time': 4:00 PM}, {'class': BIO182, 'time': 6:30 PM}]

如何使数据帧看起来像这样:

Name    CS152     PHYS162    EE193      BIO182

Bill    2:00 PM   3:30 PM    NaN        NaN
Adam    NaN       2:30 PM    1:00 PM    NaN
Sara    4:00 PM   NaN        NaN        6:30 PM

Tags: 数据nametimenanclassclassesbilladam
2条回答

也许可以更优雅一点,但有一种可能:

def to_frame(key, classes):
    """expand list of dicts into DataFrame"""
    data = [d for row in classes for d in row]
    return pd.DataFrame(data, index=[key] * len(data))


res = (
    # expand nested data structures
    pd.concat([
        to_frame(key, classes) for key, classes in data.groupby('name')['classes']
    ])
    .reset_index()
    .rename(columns={'index': 'name'})
    # pivot table
    .pivot_table(index='name', columns='class', values='time', aggfunc='first')
    .reset_index()
)
res.columns.name = None
print(res)

       name   BIO182    CS152    EE193  PHYS162
0      Adam      NaN      NaN  1:00 PM  2:30 PM
1      Bill      NaN  2:00 PM      NaN  3:30 PM
2      Sara  6:30 PM  4:00 PM      NaN      NaN

一种方法可以做到这一点…但是这可以优化

so = pd.DataFrame([['Bill',[{'class': 'CS152', 'time': '2:00 PM'}, {'class': 'PHYS162', 'time': '3:30 PM'}]],
                   ['Adam',[{'class': 'EE193', 'time': '1:00 PM'}, {'class': 'PHYS162', 'time': '2:30 PM'}]],
                   ['Sara',[{'class': 'CS152', 'time': '4:00 PM'}, {'class': 'BIO182', 'time': '6:30 PM'}]]
                  ],columns=('Name','Classes'))

for id in so.index:
    name = so.loc[id,'Name']
    classes = so.loc[id,'Classes']
    #create series data for individual person
    seriesdata = pd.Series([])

    for rowclass in classes:
        classname = rowclass['class']
        classtime = rowclass['time']
        seriesdata[classname]=classtime
    print(seriesdata)
    #Creating a dictionary of name:series data
    newdict[name]=seriesdata


df = pd.DataFrame(newdict)
print(df.T)

相关问题 更多 >

    热门问题